Sunday, September 27, 2015

Hausa on the international radio websites

Some of the earliest and longest running use of Hausa on the internet is on websites associated with the Hausa services of the shortwave radio stations of five countries: UK (BBC); US (VOA); Germany (RDW); China (CRI); and France (RFI). This post looks at how they treat Hausa text with regard to the standard "Boko" orthography, and what the implications are of the choices they make.

Background


The Hausa language, as written with the Latin-based Boko alphabet, includes several "hooked" characters that represent sounds not part of English or French, and which affect pronunciation and meaning as much as accents do in many European languages. These are: ɓ ; ɗ ; ƙ ; and in Niger, ƴ - in Nigeria 'y is written for the same sound. The capital letter forms of the four hooked letters are Ɓ Ɗ Ƙ Ƴ.

There is a fair amount written and even published using this alphabet, which largely (but not completely) supplanted the Arabic-based "Ajami" beginning in the colonial period in Nigeria. It later influenced discussions on standardizing African alphabets (more on that in a previous post).

The advent of personal computers and then the internet, which relied on the ASCII standard (basically the English alphabet) and then 8-bit encodings that supported a slightly wider range of languages but were not intercompatible, hampered use of alphabets like those of Hausa. (There had been some typewriter solutions for such alphabets, so in this sense one could argue that information technology was at first a step back for some languages.)

It was during this trough before Unicode became widely used - which would enable full use of extended Latin characters such as the hooked letters (as well as other writing systems) - that the Hausa services of the shortwave radio operations of the BBC, VOA, and RDW started putting Hausa language content online. BBC World Service had Hausa online by 2000 if not earlier, and VOA and RDW by 2002. The only practical option in an environment where computer systems used by many people were not Unicode "aware" was to use an "ASCIIfied" transcription of Hausa (substituting ɓ with b, ɗ with d, ƙ with k, and ƴ or 'y with y).

CRI's Hausa service had a web presence by 2006, and RFI inaugurated its Hausa service both on the air and on the internet in 2007.

Evolution of treatment of Hausa text on some sites


I first started informally tracking how these sites presented Hausa in 2006 (before RFI began its Hausa service), and in 2007 noted that all five operations used ASCIIfied transcriptions. The blog Hausa Online in 2006 made the same observation about VOA, but found that on the RDW site, "[t]he hooked letters ƙ ɓ ɗ are used in some, but unfortunately not in all cases."

In 2009 I took another look at BBC (still ASCIIfied, and noted page encoding was iso-8859-1, not utf-8 - in other words, not anticipating Unicode), and VOA (also still ASCII, and noted that although the page encodings were utf-8, the language parameters were "lang=en", i.e., English not "lang=ha", Hausa). It is of interest to note here that a 2010 report done by Kantar Media for BBC entitled "Qualitative research on the BBC Hausa Service" had little to say about the website, and in fact didn't ask specifically about the quality of the web content in Hausa (the natural focus on aspects of the radio broadcasts did not seem to be fully complemented with attention to aspects of the web presentation).

By 2013, RDW seemed to be using the Boko orthography to a large degree, and VOA somewhat; the others were still ASCII.

The international shortwave radio websites today


On 9 September and again earlier today, I took another look at the websites of these five Hausa radio services, with these observations:
  • Deutsche Welle (or RDW) http://www.dw.com/ha/ Appears to be the best with Hausa text (seems to use the correct orthography with extended characters where appropriate most or all the time). Uses lang=ha (though not clear why only in the battery of commands for Internet Explorer). Uses charset=utf-8. A search on "ɗaya" ("one") correctly returned results with "ɗaya"
  • Voice of America (VOA) http://www.voahausa.com/ Appears to be inconsistent with orthography. Uses lang=ha and charset=utf-8. A search on "ɗaya" ("one") correctly returned results with "ɗaya"
  • Radio France Internationale (RFI) http://ha.rfi.fr/ Appears to be inconsistent with orthography. Uses lang=ha and charset=utf-8. A search on "ɗaya" ("one") returned results with "ɗaya" along with "Haya" for some reason
  • British Broadcasting Corporation (BBC) http://www.bbc.com/hausa Appears not to use the extended characters at all (though it does use 'y which is the Nigerian equivalent of what in Niger and other countries is ƴ ). Uses lang=ha and charset=utf-8. A search on "ɗaya" ("one") returned results that ignored the letter "ɗ" as if the search term were "aya"
  • China Radio International (CRI) http://hausa.cri.cn/ Appears not to use the extended characters at all. No lang= parameter at all. Uses charset=utf-8, except for a use of charset=iso-8859-1 in search, which returns confused results on searches with extended Latin.

Since this is the result of a quick review, with no extensive or statistical pretensions, I can only offer a subjective impression including the above, and the tentative conclusion that RFI, at least, has improved in its use of the Boko orthography since 2013. It may be that RDW and VOA have also improved but I can't measure that. It does not seem that either BBC or RCI have improved at all in this regard.

BBC and VOA did correct page parameter issues noted in 2009. In fact, all five are good with the charset=utf-8 parameter, with the exception of CRI's search page, and all but CRI have the correct lang=ha parameter.

Implications and recommendations


I still think that promulgation of ASCIIfied Hausa (not using the extended characters of the Hausa Boko orthography) will negatively affect the quality of corpora drawing on these resources, with secondary effects on development of terminologies and applications.

For whatever reason, two resources online that I recently learned of are inconsistent in their use of Hausa orthography (some Boko, some ASCIIfied): Microsoft Language Portal's Hausa terminology updated for Windows 10; and ImTranslator.net 's English-Hausa and Hausa-English online translation utility. Could it be that persistent use of ASCIIfied Hausa text on websites like some of those discussed above is getting picked up in corpora?

Another negative outcome of continued use of ASCIIfied Hausa would be the impact on maintaining a quality written standard for education. Since the language has a standard orthography, it would behoove foreign entities to adhere to it, especially as the earlier technical barriers have fallen.

It would be useful in the meantime - and for that matter in the long run as well - to develop an app to "Boko-ify" Hausa text in ASCII.

#acentúate & #ɓɗƙƴ


Noting the current publicity campaign for using accent marks in various European languages, which has a hashtag on Twitter of #acentúate, might a similar approach help for raising awareness online about Hausa orthography? Maybe something like #ɓɗƙƴ?




(Some items mentioned in this post appeared on the Hausa charsets and keyboards message board.)

Thursday, September 24, 2015

The Casablanca Statement in Tifinagh

In the previous entry, I posted the English and French originals of the 2005 Casablanca Statement on localization in Africa, plus translations in Arabic, N'Ko, and Portuguese. Since then, I found the Amazight (Berber) translation, which was provided in Tifinagh script. It is included below as an image file (the text was in a pre-Unicode font).

I have also retrieved a backup version of the PanAfrican L10n wiki, and am working on getting it into presentable form on a provisional location before making it public. The wiki was started just before the Casablanca meeting a little over 10 years ago, and was developed under the PAL and ANLoc projects before going offline with the ANLoc site in late 2013.



Wednesday, September 09, 2015

The Casabanca Statement, 2005

A little more than ten years ago, the first PanAfrican Localisation (PAL) Workshop, held in Casablanca, Morocco on 13-15 June 2005, concluded with a short statement or declaration. I am reposting it here with the thought that a decade later, it is worth revisiting to measure how much has and has not been accomplished, and consider what might be next.

The Canadian International Development Research Centre (IDRC) funded the PAL project in 2005-08, followed by the African Network for Localisation (ANLoc) in 2008-11. Although ANLoc continues as a loose network, there is currently no organization or donor support comparable to what there was before.

The Casablanca Statement is included below in English and French - the two working languages of the workshop in June 2005 - as well as in Portuguese, Arabic, and N'Ko translations. There was talk of translations into other Africa languages, but as far as I know, no others have yet been done.



Casablanca Statement

African localisation experts met in Casablanca in a workshop organised by Kabissa with Bisharat under IDRC funding, and in collaboration with MTDS and the Casablanca Technopark centre. The event benefitted from contributions from the Moroccan Minister-Delegate to the Prime Minister in Charge of General and Economic Affairs, the Canadian Ambassador to Morocco, and experts from other continents.

After three days of work, the participants in the meeting reached the following conclusions:
  1. Limiting people to the use of information and communication technology (ICT) in a foreign language tends to exacerbate the digital divide; makes ICT adoption long, difficult, and expensive; and impoverishes local culture.
  2. Localisation makes ICT more accessible to everybody, including users from rural areas and young students, reinforcing the importance of our culture and helping us preserve our identity.
  3. Localisation of ICT into indigenous African languages is therefore key to rapid and fair development in Africa.
  4. For localisation to succeed and have its maximum impact in society, collaboration among governments, civil society, educators, linguists, computer professionals, standards organisations and development agencies is necessary.
We, the participants, commit ourselves to promoting this vision and working towards social development in Africa through ICT localisation.

Casablanca, 15 June 2005


Déclaration de Casablanca

Des experts Africains en localisation se sont réunis à Casablanca lors d'un atelier organisé par Kabissa avec Bisharat, financé par le CRDI et avec la collaboration de MTDS et du centre Technopark de Casablanca. La rencontre a profité de la collaboration du Ministre délégué auprès du Premier Ministre chargé des Affaires économiques et générales du Maroc, de l'ambassadeur du Canada au Maroc, et d'experts d'autres continents.

Après trois jours de travaux, les participants ont conclu comme suit :
  1. Restreindre les utilisateurs des technologies de l'information et de la communication (TIC) à une langue étrangère tend à exacerber la fracture numérique, rend l'adoption des TIC longue, difficile et onéreuse, et enfin appauvrit les cultures locales.
  2. La localisation rend les TIC accessibles à tous, y compris aux utilisateurs ruraux et aux écoliers ; elles renforcent l'importance de notre culture et nous aident à garder notre identité.
  3. La localisation des TIC en langues indigènes d'Afrique est par conséquent une des clés du développement rapide et équitable en Afrique.
  4. Le succès et l'impact maximal de la localisation dans la société exige la collaboration des gouvernements, de la société civile, des professionnels de l'éducation et de l'informatique, des linguistes, des organismes de normalisation et des agences de développement.
Nous, les participants, nous engageons à promouvoir cette vision et à travailler au développement social de l'Afrique par le biais de la localisation des TIC.

Fait à Casablanca le 15 juin 2005


Declaração de Casablanca

Peritos em localização de varias partes de África reuniram-se em Casablanca para participarem de um seminário organizado por Kabissa e Bisharat, financiado pelo IDRC/CRDI em colaboração com MTDS e o Centro Technopark de Casablanca. O evento teve o benefício de contribuições pelo Secretario do Primeiro-ministro Marroquino para Assuntos Económicos e Gerais, o Embaixador Canadense em Marrocos e peritos de outros continentes.

Após três dias do trabalho, os participantes na reunião chegaram às seguintes conclusões:
  1. Limitar as pessoas ao uso da tecnologia de informação e de comunicação (TIC) numa língua estrangeira, por regra aumenta o fosso digital; torna a adopção de TICs um processo longo, difícil, e caro; e empobrece a cultura local.
  2. Localização torna TIC mais acessível a todos, incluindo usuários das áreas rurais e estudantes jovens, reforçando a importancia da cultura própria e ajudando a preservar a identidade.
  3. A localização de TIC em línguas africanas indígenas é consequentemente chave ao desenvolvimento rápido e justo em África.
  4. Para que localização seja um sucesso e tenha o maior impacto na sociedade, é necessária a colaboração entre governos, a sociedade civil, os educadores, os linguistas, os profissionais de informática, as organizações de padrões e as agências de desenvolvimento.
Nós, os participantes, cometemo-nos a promover esta visão e a desempenhar esforços para o desenvolvimento social em África através de localização de TIC.

Casablanca, aos 15 de Junho, 2005


بيان الدار البيضاء

التقى خبراء التطويع والتعريب الأفريقيون بالدار البيضاء في ورشة عمل تحت رعاية كابيسا (Kabissa) وبشارات بتمويل من مركز بحوث التنمية الدولي (IDRC)، وبالتنسيق مع MTDS والمنتزه التقني "تكنوبارك الدار البيضاء". وحقق هذا الحدث استفادة رائعة من المشاركات والإسهامات التي قدمها الوزير المغربي المفوض من قِبل رئيس الوزراء والمعني بالشؤون الاقتصادية والنواحي العامة، والسفير الكندي بالمغرب، بالإضافة إلى خبراء من قارات أخرى.
وبعد ثلاثة أيام من العمل، توصل المشاركون في الاجتماع إلى النتائج التالية:
  • يؤدي قصر استخدام الفرد لتكنولوجيا المعلومات والاتصالات (ICT) على لغة أجنبية دون اللغة المحلية إلى تفاقم الفجوة الرقمية واتساعها، وزيادة الفترة الزمنية المستغرقة لنشر استخدام تكنولوجيا المعلومات والاتصالات وصعوبة تحقيق ذلك مع ارتفاع في التكلفة، إضافًة إلى إضعاف الثقافة المحلية.
  • يعمل التطويع والتعريب على إتاحة إمكانية استخدام تكنولوجيا المعلومات والاتصالات للجميع، بما في ذلك المستخدمون من المناطق الريفية والطلبة صغار السن؛ مما يؤكد على أهمية حضارتنا ويدعمها ويتيح لنا الاحتفاظ بهويتنا الثقافية.
  • يعد تطويع تكنولوجيا المعلومات والاتصالات وتعريبها إلى اللغات الأفريقية القومية، بالنظر لما سبق، عاملاً أساسيًا للتطوير والتنمية بخطى سريعة وبصورة عادلة في القارة الأفريقية.
  • يجب على كل من الحكومات والمجتمع المدني والعاملين بمجال التعليم واللغة، وكذلك محترفي الكمبيوتر وهيئات التطوير والتنمية والمؤسسات الرسمية التعاون والتضامن لإنجاح صناعة التطويع والتعريب وإتيان ثمارها على النحو المرجو منها.
ونلتزم نحن المشاركون بتعزيز هذه الرؤية ودعمها والعمل نحو تحقيق مزيد من التنمية الاجتماعية في أفريقيا من خلال تطويع وتعريب تكنولوجيا المعلومات والاتصالات.
الدار البيضاء، 15 يونيو 2005





English & French versions came out of the Workshop. Portuguese translated by Mr. Rui Correia. Arabic translated by Dr. Adel El Zaim. N'Ko translated by Prof. Baba Mamady Diané (text was pre-Unicode).

Wednesday, September 02, 2015

On diacritics & modified characters in African languages

Various past posts on Beyond Niamey have touched on aspects of Latin-based orthographies of many African languages, especially "extended Latin," which is a technical term for characters and diacritics beyond the basic letters we use in English (and the common accented characters used in major European languages). Since I anticipate returning to this topic in some future posts, I thought I'd reach back to some background I wrote on the old A12n-Collaboration list in 2006.

The context was a request from Kasahorow in their June 2006 newsletter for feedback on the topic of diacritics. Keep in mind that the technical context of this issue has changed remarkably over the past decade, to where the "nightmare" of diacritics for electronic publishers is no longer such a problem. Nevertheless, the question of why use diacritics (or modified characters) at all is probably one that still gets asked regardless of improvements in how software handles complex scripts.
Yoruba and Igbo are classic examples of diacritic heaven. However, in reality what works well for linguists and non-native learners of these two languages is a nightmare for electronic publishers.

We are sampling a professional opinion to further understand whether the diacritics are a vestigial crutch for the non-native transcribers who first set these languages to text or a necessary part of the textual representation of these recently alphabetized languages.

It is often the case that writing in languages with young written histories closely follows the oral form hence the need for diacritics to preserve the tone variations of the spoken word. However, the problem of ambiguity in the absence of diacritics can be solved with new writing techniques. (from Kasahorow, reproduced in posting on A12n-Collab, 16 June 2006)
My long reply on A12n-Collab (24 June 2006) addresses some assumptions about the origins of extended and complex Latin orthographies for many African languages, making reference to some of the colonial and post-independence history. I've layered in links to various topics discussed, and added two footnotes:

Here's a stab at responding to Kasahorow's request for feedback on the issue of diacritics in African language transcriptions.

First I'd want to put the issue in historical perspective since discussions on orthographies in Africa generally, and for the specific languages mentioned (Yoruba, Igbo), go back decades or in some cases well over a century. It is true that the Latin-based transcriptions of African languages have their origins in the colonial period or just before, spearheaded by missionaries and in some cases colonial administrators, but this should not discredit them (I'll touch on this issue later, below).

Yoruba is an interesting case, since a Yoruba, Samuel Ajayi Crowther, played a major role in developing the Latin orthography for the language before the partition of Africa. A major feature of the alphabet was the use of small vertical lines under 3 letters that had different sounds than in English (generally now we see dots under). Another feature is tone marking, since among tonal languages, Yoruba is apparently one of the more complex.

Igbo, for which a standard orthography was adopted almost a century later (c. 1960) was also apparently written in the late 1800s. It also used marks under two letters and a dot over an n to distinguish meaningfully different sounds, and also has meaningfully significant tones.

By the 1920s there were more concerted efforts to devise consistent orthographies. One notable resume of this effort is the
Practical Orthography of African Languages (1930)* which can be read at http://www.bisharat.net/Documents/poal30.htm . Many of the additional letters proposed were evidently adopted directly from the International Phonetic Alphabet, which had its origins in the late 19th century. It is of interest to note that this proposed orthography appears to have been a major influence on later African discussions of transcription of the continent's languages, such as Bamako 1966 and Niamey 1978.

The text of the
Practical Orthography is in some places unfortunately phrased in the condescending language of colonialism, but there are some observations that stand the test of time, such as part of the rationale for modified Latin letters rather than diacritics to mark different sounds: "For practical purposes in everyday life diacritic marks constitute a difficulty and a danger. In the first place it is found that in current writing these marks are liable to be altered so as to be unrecognizable and even omitted altogether, as every one who has had to read written texts in African languages will readily acknowledge. Such alterations and omissions of diacritic marks are also frequently found in print. ..."

Such a situation is probably typical of young writing systems (others may want to comment), where there are not the cultural and pedagogical stays that support continued use of, say, accent marks in French or vowel marks in Arabic. I would offer that it is not so much the diacritics themselves that are the problem, but whether the educational system teaches their use consistently.

Note in any event that the use of this proposed African alphabet of 1930 did not supersede the original one for Yoruba; also it did not get established in southern Africa. For all the inconvenience that the undermarks (be they dots or lines of whatever sort) presented for typographers then, it apparently was not enough to lead to their abandonment in favor of the
Practical Orthography. By now, there is a lot more material in Yoruba using the orthography developed in the 19th century. (On the other hand, Yoruba tongues spoken in Benin are written using a system related to the Practical Orthography and rules discussed in Bamako 1966, etc.) I'll return to the issue of tone marks shortly.

In the case of Igbo, the writing system was apparently standardized somewhat later, but it is worth noting that the effort in 1998 of one Nigerian linguist to introduce a different orthography for Igbo in his publication of a dictionary has met with a vary unfavorable reaction, at least on the part of experts.

This leads one back to part of the question in the Kasahorow newsletter: whether such writing systems work really more for the linguists and language experts than for wider use. This I think is not a helpful dichotomy. Even a historically young writing system such as those of Yoruba and Igbo have already become part of the culture in certain ways. This is probably a major reason why the
Practical Orthography did not replace the use of dot-under characters in southern Nigeria 70 years ago - the "look" of Yoruba or of Igbo, even to those not literate in those languages probably, included and still includes marks under.

There is a point of view that not only the diacritics are problematic, but the extended characters too - basically anything more than the alphabet as used in English or perhaps French. I had occasion once to speak with the (American) co-author of a bilingual dictionary for a West African language who expressed the opinion that the extended characters in the official orthography were "silly" and asked rhetorically why they don't they just use combinations of ASCII characters. (Some foreigners dismiss attention to whole languages in such terms, but here was someone dismissing only the orthography chosen for one of the languages.)

Ultimately I think it is helpful to remember that in addition to the prominent role of non-African experts in the development of most orthographies for African languages, non-Africans may also be prominent in offering negative opinions about it all. In fact it may be that non-Africans actually dominate both sides of any discussion on the utility of such matters as diacritic characters, extended characters, and tone marks. (I admittedly may be implicated in this too, though I do try to encourage discussion in which Africans dominate - part of the reason for instance that I sought and posted various African documents on language policy, etc.)

This is a dynamic sadly common to so much of development and African studies over the years - outsider experts effectively dominate discussion and analysis of African development, ultimately occupying if not pre-empting both sides of any major debate. (If anyone's interested I can discuss this offlist.) At the same time it is important to point out that, such as with Bishop Crowther and with the later UNESCO-sponsored conferences on African language transcription, Africans have not been bystanders in the development of Latin-based orthographies for their languages. The history of this needs to be formally explored in more detail (I'm aware of only one in-depth historical study that dealt with Hausa orthography**). In other words, what I'm trying to say by all of this is that the issue cannot be reduced to one of external imposition of complexity vs. unstated indigenous preference for something simpler.

As for tone marks, these may not be the big problem that people often think they are. Africa has many tonal languages and in many cases the need to mark tones is not that apparent. Bambara and Hausa for instance are commonly written in Latin transcription without tone marks since the context usually makes the meaning clear. Tone marks can be used to disambiguate text. We've discussed this matter some already on A12n-collab. Yoruba and Igbo may not be as forgiving in this matter as some other languages. Or maybe tone marks could be optional for ordinary text - mainly for disambiguation (is this the case for newspapers in Yoruba already?).

There was a proposal made by Prof. Constancio K. Nakuma, a Ghanaian researcher on the Dagaare language of northern Ghana and southern Burkina Faso, to use a new set of spelling conventions to indicate tones in text in that language (one of these as I recall was insertion of an "h" after vowels to indicate high tone). Not sure if this is the "new approach" Kasahorow alludes to. There is perhaps merit to this particular approach, but it is not widely understood and certainly introduction of such a system for languages that do not use it would lead to some confusion and arguably hinder development and literacy in these languages.

Part of the problem as I've already mentioned, is that for the writing systems for many African languages, one is not starting with a blank slate. Proposing new systems - whether they be ingenious or dumbed-down - means undoing or discounting what has been in place for a while. It is not impossible, but certainly would lead to lost time even in the optimal case where there is a broad agreement on the course of action to take.

And even with less widely spoken languages in proximity with the more widely spoken ones, there may be issues of "harmonization" of transcriptions - does it make sense to have very different orthographies for diverse languages in a given geographic area when there already will be a differences with the English / French / Portuguese used officially and in later schooling?

Anyway this has been a bit longwinded as there are a number of matters implied even by as simple a question as you pose. Returning to the starting point of the question, you actually raise the issue in the context not of the native-speaking user but the electronic publisher. This is a real issue but much more narrow and IMO not a significant enough issue to use to justify significant changes in a writing system. It may actually be the case that the problems encountered by typesetters today will be overcome well before any new writing system designed to avoid them is established.

In sum therefore, I'd suggest that:

  1. While it is good, and even necessary to have more African than foreign participation in discussions about (writing) African languages, the topic area is not new and there is a history within Africa with African dimensions that shouldn't be overlooked.
  2. The origin of the writing system needn't be an issue: Latin script replaced the indigenous writing systems of northern Europe via the Christian church - and it was adapted to the non-romance languages by clergy and specialists; Arabic was brought to the Sahel centuries ago and adapted to indigenous tongues by the few learned in Islam and Arabic language; etc..
  3. Changing writing systems to something more logical is discussed for a lot of languages (including such as English and French) but in the end such a process of changing makes less sense (and costs more) than to make the best of what one has. It may be the "right" thing to do in some cases (Turkey shifted from Arabic to Latin script, for example, and script changes are discussed for a number of languages elsewhere), but is probably best taken on the highest appropriate official level (usually a country government, in consultation with experts and the relevant community of speakers, and ideally also governments of neighboring states where the language is spoken).

Notes:
* The 1930 publication of the Practical Orthography of African Languages was the revised edition of the original 1928 publication
** John Edward Philips, 2000, Spurious Arabic: Hausa and Colonial Nigeria, Madison, WI: African Studies Program, University of Wisconsin. A table from the book comparing Ajami and Boko Hausa alphabets is available online. An article available online offers a shorter treatment of topics from or related to the book: John Edward Philips, 2004, "Hausa in the Twentieth Century: An Overview," Sudanic Africa, 15: 55-84.

(See also on this blog, "More on standard orthographies of African languages," 16 Nov. 2013.)