Some of the earliest and longest running use of Hausa on the internet is on websites associated with the Hausa services of the shortwave radio stations of five countries: UK (BBC); US (VOA); Germany (RDW); China (CRI); and France (RFI). This post looks at how they treat Hausa text with regard to the standard "Boko" orthography, and what the implications are of the choices they make.
The Hausa language, as written with the Latin-based Boko alphabet, includes several "hooked" characters that represent sounds not part of English or French, and which affect pronunciation and meaning as much as accents do in many European languages. These are: ɓ ; ɗ ; ƙ ; and in Niger, ƴ - in Nigeria 'y is written for the same sound. The capital letter forms of the four hooked letters are Ɓ Ɗ Ƙ Ƴ.
There is a fair amount written and even published using this alphabet, which largely (but not completely) supplanted the Arabic-based "Ajami" beginning in the colonial period in Nigeria. It later influenced discussions on standardizing African alphabets (more on that in a previous post).
The advent of personal computers and then the internet, which relied on the ASCII standard (basically the English alphabet) and then 8-bit encodings that supported a slightly wider range of languages but were not intercompatible, hampered use of alphabets like those of Hausa. (There had been some typewriter solutions for such alphabets, so in this sense one could argue that information technology was at first a step back for some languages.)
It was during this trough before Unicode became widely used - which would enable full use of extended Latin characters such as the hooked letters (as well as other writing systems) - that the Hausa services of the shortwave radio operations of the BBC, VOA, and RDW started putting Hausa language content online. BBC World Service had Hausa online by 2000 if not earlier, and VOA and RDW by 2002. The only practical option in an environment where computer systems used by many people were not Unicode "aware" was to use an "ASCIIfied" transcription of Hausa (substituting ɓ with b, ɗ with d, ƙ with k, and ƴ or 'y with y).
CRI's Hausa service had a web presence by 2006, and RFI inaugurated its Hausa service both on the air and on the internet in 2007.
I first started informally tracking how these sites presented Hausa in 2006 (before RFI began its Hausa service), and in 2007 noted that all five operations used ASCIIfied transcriptions. The blog Hausa Online in 2006 made the same observation about VOA, but found that on the RDW site, "[t]he hooked letters ƙ ɓ ɗ are used in some, but unfortunately not in all cases."
In 2009 I took another look at BBC (still ASCIIfied, and noted page encoding was iso-8859-1, not utf-8 - in other words, not anticipating Unicode), and VOA (also still ASCII, and noted that although the page encodings were utf-8, the language parameters were "lang=en", i.e., English not "lang=ha", Hausa). It is of interest to note here that a 2010 report done by Kantar Media for BBC entitled "Qualitative research on the BBC Hausa Service" had little to say about the website, and in fact didn't ask specifically about the quality of the web content in Hausa (the natural focus on aspects of the radio broadcasts did not seem to be fully complemented with attention to aspects of the web presentation).
By 2013, RDW seemed to be using the Boko orthography to a large degree, and VOA somewhat; the others were still ASCII.
On 9 September and again earlier today, I took another look at the websites of these five Hausa radio services, with these observations:
Since this is the result of a quick review, with no extensive or statistical pretensions, I can only offer a subjective impression including the above, and the tentative conclusion that RFI, at least, has improved in its use of the Boko orthography since 2013. It may be that RDW and VOA have also improved but I can't measure that. It does not seem that either BBC or RCI have improved at all in this regard.
BBC and VOA did correct page parameter issues noted in 2009. In fact, all five are good with the charset=utf-8 parameter, with the exception of CRI's search page, and all but CRI have the correct lang=ha parameter.
I still think that promulgation of ASCIIfied Hausa (not using the extended characters of the Hausa Boko orthography) will negatively affect the quality of corpora drawing on these resources, with secondary effects on development of terminologies and applications.
For whatever reason, two resources online that I recently learned of are inconsistent in their use of Hausa orthography (some Boko, some ASCIIfied): Microsoft Language Portal's Hausa terminology updated for Windows 10; and ImTranslator.net 's English-Hausa and Hausa-English online translation utility. Could it be that persistent use of ASCIIfied Hausa text on websites like some of those discussed above is getting picked up in corpora?
Another negative outcome of continued use of ASCIIfied Hausa would be the impact on maintaining a quality written standard for education. Since the language has a standard orthography, it would behoove foreign entities to adhere to it, especially as the earlier technical barriers have fallen.
It would be useful in the meantime - and for that matter in the long run as well - to develop an app to "Boko-ify" Hausa text in ASCII.
Noting the current publicity campaign for using accent marks in various European languages, which has a hashtag on Twitter of #acentúate, might a similar approach help for raising awareness online about Hausa orthography? Maybe something like #ɓɗƙƴ?
(Some items mentioned in this post appeared on the Hausa charsets and keyboards message board.)
Background
The Hausa language, as written with the Latin-based Boko alphabet, includes several "hooked" characters that represent sounds not part of English or French, and which affect pronunciation and meaning as much as accents do in many European languages. These are: ɓ ; ɗ ; ƙ ; and in Niger, ƴ - in Nigeria 'y is written for the same sound. The capital letter forms of the four hooked letters are Ɓ Ɗ Ƙ Ƴ.
There is a fair amount written and even published using this alphabet, which largely (but not completely) supplanted the Arabic-based "Ajami" beginning in the colonial period in Nigeria. It later influenced discussions on standardizing African alphabets (more on that in a previous post).
The advent of personal computers and then the internet, which relied on the ASCII standard (basically the English alphabet) and then 8-bit encodings that supported a slightly wider range of languages but were not intercompatible, hampered use of alphabets like those of Hausa. (There had been some typewriter solutions for such alphabets, so in this sense one could argue that information technology was at first a step back for some languages.)
It was during this trough before Unicode became widely used - which would enable full use of extended Latin characters such as the hooked letters (as well as other writing systems) - that the Hausa services of the shortwave radio operations of the BBC, VOA, and RDW started putting Hausa language content online. BBC World Service had Hausa online by 2000 if not earlier, and VOA and RDW by 2002. The only practical option in an environment where computer systems used by many people were not Unicode "aware" was to use an "ASCIIfied" transcription of Hausa (substituting ɓ with b, ɗ with d, ƙ with k, and ƴ or 'y with y).
CRI's Hausa service had a web presence by 2006, and RFI inaugurated its Hausa service both on the air and on the internet in 2007.
Evolution of treatment of Hausa text on some sites
I first started informally tracking how these sites presented Hausa in 2006 (before RFI began its Hausa service), and in 2007 noted that all five operations used ASCIIfied transcriptions. The blog Hausa Online in 2006 made the same observation about VOA, but found that on the RDW site, "[t]he hooked letters ƙ ɓ ɗ are used in some, but unfortunately not in all cases."
In 2009 I took another look at BBC (still ASCIIfied, and noted page encoding was iso-8859-1, not utf-8 - in other words, not anticipating Unicode), and VOA (also still ASCII, and noted that although the page encodings were utf-8, the language parameters were "lang=en", i.e., English not "lang=ha", Hausa). It is of interest to note here that a 2010 report done by Kantar Media for BBC entitled "Qualitative research on the BBC Hausa Service" had little to say about the website, and in fact didn't ask specifically about the quality of the web content in Hausa (the natural focus on aspects of the radio broadcasts did not seem to be fully complemented with attention to aspects of the web presentation).
By 2013, RDW seemed to be using the Boko orthography to a large degree, and VOA somewhat; the others were still ASCII.
The international shortwave radio websites today
On 9 September and again earlier today, I took another look at the websites of these five Hausa radio services, with these observations:
- Deutsche Welle (or RDW) http://www.dw.com/ha/ Appears to be the best with Hausa text (seems to use the correct orthography with extended characters where appropriate most or all the time). Uses lang=ha (though not clear why only in the battery of commands for Internet Explorer). Uses charset=utf-8. A search on "ɗaya" ("one") correctly returned results with "ɗaya"
- Voice of America (VOA) http://www.voahausa.com/ Appears to be inconsistent with orthography. Uses lang=ha and charset=utf-8. A search on "ɗaya" ("one") correctly returned results with "ɗaya"
- Radio France Internationale (RFI) http://ha.rfi.fr/ Appears to be inconsistent with orthography. Uses lang=ha and charset=utf-8. A search on "ɗaya" ("one") returned results with "ɗaya" along with "Haya" for some reason
- British Broadcasting Corporation (BBC) http://www.bbc.com/hausa Appears not to use the extended characters at all (though it does use 'y which is the Nigerian equivalent of what in Niger and other countries is ƴ ). Uses lang=ha and charset=utf-8. A search on "ɗaya" ("one") returned results that ignored the letter "ɗ" as if the search term were "aya"
- China Radio International (CRI) http://hausa.cri.cn/ Appears not to use the extended characters at all. No lang= parameter at all. Uses charset=utf-8, except for a use of charset=iso-8859-1 in search, which returns confused results on searches with extended Latin.
Since this is the result of a quick review, with no extensive or statistical pretensions, I can only offer a subjective impression including the above, and the tentative conclusion that RFI, at least, has improved in its use of the Boko orthography since 2013. It may be that RDW and VOA have also improved but I can't measure that. It does not seem that either BBC or RCI have improved at all in this regard.
BBC and VOA did correct page parameter issues noted in 2009. In fact, all five are good with the charset=utf-8 parameter, with the exception of CRI's search page, and all but CRI have the correct lang=ha parameter.
Implications and recommendations
I still think that promulgation of ASCIIfied Hausa (not using the extended characters of the Hausa Boko orthography) will negatively affect the quality of corpora drawing on these resources, with secondary effects on development of terminologies and applications.
For whatever reason, two resources online that I recently learned of are inconsistent in their use of Hausa orthography (some Boko, some ASCIIfied): Microsoft Language Portal's Hausa terminology updated for Windows 10; and ImTranslator.net 's English-Hausa and Hausa-English online translation utility. Could it be that persistent use of ASCIIfied Hausa text on websites like some of those discussed above is getting picked up in corpora?
Another negative outcome of continued use of ASCIIfied Hausa would be the impact on maintaining a quality written standard for education. Since the language has a standard orthography, it would behoove foreign entities to adhere to it, especially as the earlier technical barriers have fallen.
It would be useful in the meantime - and for that matter in the long run as well - to develop an app to "Boko-ify" Hausa text in ASCII.
#acentúate & #ɓɗƙƴ
Noting the current publicity campaign for using accent marks in various European languages, which has a hashtag on Twitter of #acentúate, might a similar approach help for raising awareness online about Hausa orthography? Maybe something like #ɓɗƙƴ?
#Twitter: Las etiquetas pueden usar tildes #Acentúate. https://t.co/Vyh9nn4dPw pic.twitter.com/leo9I7IxXT
— UEES Online (@uees_online) September 18, 2015
(Some items mentioned in this post appeared on the Hausa charsets and keyboards message board.)