Saturday, December 31, 2016

Revisiting an African language content strategy

While in Mali in late 1999 and early 2000, inspired in part by early research by FUNREDES on Languages and Cultures of the Internet, I began thinking about strategies for increasing African language web content. While recognizing of course that such content would come primarily from communities of speakers of African languages, as well as internationally funded projects which at the time were beginning to think about how to use the internet for development, the motivation was to facilitate creation of an environment favorable to its creation and use.

ISOC's report, 8/2016
Recent discussion of one project and reading about another, each of which deal in different ways with content and communication, and reading the Internet Society's (ISOC) August 2016 report on "Promoting Content in Africa," prompt me to revisit this early effort and look at how things are playing out.

Looking forward from 1999

The basic idea in 1999 was to disaggregate approaches to internet content development in African languages, and consider how each could optimally contribute to the overall goal of greater presence of those languages in cyberspace. In 2003 I reworked that schema to share more widely - for example on the short-lived Africa Web Content Owner email list.1 Elements of this strategy were incorporated in different ways in later work, such as African Languages in a Digital Age (ALDA).

The main approaches were:
  1. Composition of text-based content (including where possible, digitization of works previously published in African languages)
  2. Translation of text-based content from other languages (leveraging the then emerging machine translation [MT] technology)
  3. Development of content in non-text formats (with specific reference to audio2)
It was recognized that production of text-based content from scratch, by composing material (#1 above), is an incremental and artisanal process. In other words, it takes time and effort to achieve modest results.This is especially the case for languages with younger written traditions that are not well supported in education or even sometimes fully supported in software (fonts were then a problem for certain writing systems, for example, and keyboards for them still are). No matter how fundamental text content was, it would be hard to keep pace with content creation in many non-African languages, leaving speakers of those languages with few opportunities to see their language online.

Therefore the possibility of taking existing texts in African languages - from published books or other printed materials - and putting them on the web was suggested as a way to give a small but significant boost to efforts to generate African language internet content. Such texts often have historical or cultural value, and may already be in a standard orthography (or transcriptions that could be easily converted into them.3 A sustained effort to "weblish" these materials, according to this thinking, could quickly add quality material to what is available on the web in a number of languages, and more importantly, make many materials that are accessible only in university libraries more readily available to speakers of those languages. However, copyright protections limit the potential of this tactic (although there have been some sites that appear to have made some of these materials available online without permission).

Therefore, an emphasis was put on alternative ways to create new content, especially translation (#2 above) of various relevant, useful, and interesting material aided by MT, and content built around the spoken voice (#3), responding to oral dimensions of African cultures, as well as the low literacy rates in African languages.

MT in that era especially was mainly a hope for the future, and aside from a few experiments most advances in the 2000s were for pairs of major (mainly Europhone) languages. Nowadays the technology has improved, but the statistical methods that have been key in that evolution require language resources that do not exist for many languages (at least yet). As such, the contribution of MT to content development in African languages is still in the future. 

As far as audio content, this did not emerge as a significant component on the internet (unless one counts songs or the sending of audio files as email attachment, both of which were transferred over the internet rather than presented as part of web content). But see the discussion regarding video sharing below.

Just how much African language content?

Through the 2000s, African language content seemed to grow only marginally and unevenly. A pair of studies published by Rifal in 2003 provided perspectives on this subject that are still useful:
I'm not aware of any more recent studies along these lines, but it may still be the case that there is a significant number of sites with at least some African language content, but that these are still mostly descriptions.

New kinds of content

The rise of social media, video sharing, and mobile devices over the last decade or so has changed how we think of and produce web content, opening new possibilities for African languages in cyberspace.

Social media, including blogs and wikis, makes the creation of content in any language easier. But in the case of many African languages, also brings us face-to-face with other limitations in input systems (for extended Latin and non-Latin scripts), education (where schools use only Europhone languages so that people aren't familiar with writing their first languages), and incentive (where the audience for text-based content in less widely spoken African languages is perceived to be small).

Video in a way fulfills the old idea of audio content on the web, but with the obvious advantage of visual (though there are at least a few YouTube videos with static presentation - a picture or line of text - and full audio in one or another African language). What's missing as far as I can tell is a way to find videos in specific languages that does not rely on the producer having tagged it appropriately (which may not happen).

Mobile devices have changed how we access and interact with content, and consequently how content is designed and even conceived. They also have become the most common way for Africans in general to access the internet - proportionately more important I believe than in any other continent. What I don't have a sense of is how much content in African languages is developed with mobile devices in mind. On the other hand the input limitations for some writing systems would certainly be an issue for use of some African languages in messaging for example.

"Promoting Content in Africa," 2016

ISOC's recent report includes a look at structures to support development of content in Africa, including in African languages. It is encouraging to note the attention ISOC is giving in this report to the importance of African language content for internet use in Africa.

One of the recommendations ISOC has for promoting local content in Africa, including that in African languages, is to promote development of local infrastructure, including data centers, Content Delivery Networks, and Internet Exchange Points. This idea to in effect create a facilitating environment for creation of African language content is an interesting strategy, and would complement other efforts such as mentioned above.

1. The direct link to the post in the AWCO archives is apparently accessible only to subscribed group members. I've created an alternative presentation of it on my website. That post has more background.
2. An early consideration of audio and web content on AWCO mentions Native American interest in the topic, as well as a project in Mauritania (also available on my website).
3. Numerous transcriptions of histories and tales from before the adoption of current orthographies used systematic notation that generally corresponds directly to characters used today (1:1 or occasionally 2:1). I have encountered this for example in various older materials on Fula and Bambara. Cheick Anta Diop's famous Wolof translations of scientific and European cultural texts into Wolof (1955) similarly used a regular transcription that predated the current standard orthography in Senegal.

Friday, December 30, 2016

A meeting with ACALAN in Bamako

At the end of last month while in Bamako, Mali, I had the chance to meet with Adama Samassekou, who is currently serving in an advisory capacity with the African Academy of Languages (ACALAN), and with the current executive secretariat of ACALAN - Dr. Lang Fafa Dampha, Senior Research and Program Officer, and acting Executive Secretary; Dr. Ojo Babajide Johnson, Senior Program and Project Officer; and Kossi Abassa, Finance and Administrative Officer.

ACALAN's office is now in the Hamdallaye ACI quarter of Bamako - when I previously visited it in May 2008, its office was in Koulouba. The organization is evidently rebuilding, with a search underway for a new executive secretary and other staff to hire. They were also preparing for a two-day meeting of the Technical and Scientific Committee (held 8-9 December), one of the major organs of ACALAN.

Other working structures of ACALAN include Vehicular Cross-Border Language Commissions for 12 African languages (a structure I mentioned on this blog 3 years ago), and the national language agencies in the various African countries.

They also have a number of projects including on terminology and lexicography, the linguistic atlas of Africa, cyberspace, interpretation and translation, collection of stories, and a graduate program in applied linguistics. While ACALAN is headquartered in Bamako, its Pan-African Center of Interpretation and Translation, and the Terminology and Lexicography Project are based in Dar-es-Salaam, Tanzania.

And ACALAN publishes various materials and reports, as well as an academic journal called Kuwala.

The meeting we had last month will hopefully facilitate collaboration in the future. 

It is also worth noting that this year we are about to conclude marks one decade since the ACALAN-sponsored Year of African Languages. That year, 2006, was also the year that ACALAN (founded in 2001) formally became the African Union's specialized language agency.

Wednesday, December 28, 2016

Mabati-Cornell Kiswahili Prizes 2016

The second annual Mabati-Cornell Kiswahili Prizes for African Literature were awarded earlier this month at Cornell University in Ithaca, New York. As discussed previously on this blog, the Mabati-Cornell is the only literary award going to writers publishing in African languages.

Mabati-Cornell, which was founded in late 2014 by Cornell faculty Dr. Mukoma Wa Ngugi and Caine Prize for African Writing director Dr. Lizzie Attree, recognizes literature in the Swahili language. Its sponsorship by by the Kenyan company Mabati Rolling Mills led Mukoma Wa Ngugi to state that "the prize sets an historical precedent for African philanthropy by Africans and shows that African philanthropy can and should be at the centre of African cultural production."

This year's prizes (announced on 14 Dec. 2016) went to:
  • Idrissa Haji Abdalla (Tanzania), for Kilio cha Mwanamke (fiction #1)
  • Hussein Wamaywa (Tanzania), for Moyo Wangu Unaungua (fiction #2)
  • Ahmed Hussein Ahmed (Kenya), for Haile Ngoma ya Wana (poetry)
The 2015 Mabati-Cornell Kiswahili prizes went to:
  • Anna Samwel (Tanzania), for Penzi la Damu (fiction #1)
  • Enock Maregesi (Tanzania), for Kolonia Santita (fiction #2)
  • Mohammed K. Ghassani (Tanzania), for N'na Kwetu (poetry #1)
  • Christopher Bundala  (Tanzania), Kifaurongo (poetry #2)

Friday, December 02, 2016

Quick comments on language in Mali today

In transit on the way back from a quick 3 weeks in Mali as part of a short-term consultancy with the Mali Justice Project. More on the project at another time, hopefully, but here are some quick (and unfortunately superficial) observations regarding language in Mali while they're still fresh in the memory.

This was my first time in Mali since 2008. Bamako has grown considerably, and from observations and descriptions, there is a much larger urban middle class. However, that has not seemed to have been accompanied by a shift to French in everyday language (as one might see in some other cities in Francophone states). Bambara seems to be spoken everywhere, with French and occasional other languages as well.

As an obvious foreigner, efforts to use Bambara are generally met positively or matter-of-factly (to the extent one's accent hasn't obscured the fact one is using the language). That was great, though I admit it almost got a bit disconcerting to have airport security shift out of role to banter with the toubab speaking broken Bambara.

Only one real opportunity to speak Fulfulde, and that with a colleague in the project office. I found though that the whoosh of Fulfulde I was able to call up (to my surprise) was hard to turn off at first. Part of that is shifting between too many languages for a former monolingual (even crossed wires between Bambara and Chinese once - an old occasional lapse I ascribe to the similar structures of the languages and how I think my brain handles them).

In traveling outcountry to Segou and Sikasso, also found Bambara easy to use in various settings. With Segou that is expected, since it is ethnically Bambara (and the center of a major precolonial Bambara kingdom), but even in Sikasso, the major city in the ethnically Senufo/Minianka region of Mali, there was no problem speaking Bambara with anyone small merchants to heads of services. In fact, a 2-day project meeting of people involved in commerce, transportation, and services de contrôle in Sikasso worked mainly in Bambara after starting in French.

Not so much signage in Bambara, though on the road to Segou did notice a couple of signs with N'Ko (wasn't able to get photos, sorry). Orange - the phone company - had a TV ad with "I ni tié" (i ni cɛ ≈ thank you). Frenchified renditions of Bambara text are frequent in the written forms I saw in such short usage, which generally accompanied French.

Also had the chance to visit the ACALAN offices - more about that in another post.

Saturday, November 19, 2016

Some illustrated Senufo proverbs

"Yì fwù ɲara na" (welcome)
On the margins of some other research, I was recently able to pay a visit to the Centre de Recherche pour la Sauvegarde et la Promotion de la Culture Sénoufo (CRSPCS) in Sikasso, Mali. Although the purpose of going there was not primarily language-related, it is worth noting that among the CRSPCS's areas of activity are research and publication on the Senufo language(s) spoken in southern Mali, northern Ivory Coast, and western Burkina Faso.

The center was founded in 2005, the result of an effort begun by Rev. Emilio Escudero Yangüela . It serves educational roles, including in coordination with the regional museum in Sikasso, and has collected cultural objects that are on display. (A short video in French that evidently aired on Malian TV gives a more complete introduction.)

In a tour of the grounds and buildings housing collections - for which we thank Mr. Elie Yaya Bambe - one notices several outside walls that are decorated with proverbs and illustrations. Some of these follow:
Dù fànŋà kà mà sâ, ká mū dí ŋkwōō wólò, kìmàhā mpɔ́rɔ́ mūnā.
If a donkey gives you a kick, and you reply in kind, it is better than you.

Ná mū sí kàcɛ̀nnɛ̀ pyǐ kùnùŋɔ́nā, mùmàhā kǐyǎhǎ ywɔ̌hɔ̌ ɲɔ́ná.
If you want to be good to the tortoise, put it close to the water.

Kùtùnɔ̌ ká ncyɛ̌ jyègěě kǎnɛ̀ŋɛ̀nǐ, ŋkórò kǐyɛ̀.
If the monkey refuses to enter into the dew, it ends up being all alone.

Mūhà bìmâ lē mā khɔ́hɔ́ɲɛ́ɛ́n ɲìī nī, mū màhā ŋkhɔ̀hɔ̀lì mā yɛ̀.
If you put dust in the eyes of your dancing partner, you dance alone.
I did not get a clear answer about which Senufo language these proverbs are written in, but the main Senufo variety in Sikasso is Supyire. (The tone markings as seen in the paintings are reproduced as best as possible as text in the captions; English translations from the French translations. Corrections of course welcome.)

Senufo proverbs, riddles, and tales

The CRSPCS has not published Senufo proverbs, but it has produced small books of riddles and tales in Senufo with parallel French text. One riddle from Devinettes Sénoufo, Vol 1 (Elie Yaya Mpê Bamba and Bernard Delay, eds., Collection "Wu Nire," Harmattan Burkina, 2015):
Ŋuni a tɔɔn, tɛgɛlɛ bàa. (It is so long that it has no end.)
Kudo. (A path.)
Note the more sparing use of tone marks. Judging from the CRSPCS publications I saw, and by an online dictionary of Mamara/Minianka (another Senufo language), usage of tone marks in writing generally may be more sparing than what one sees in the proverbs above.

For more proverbs, there is a collection published by Timothy F. Garrard in 2001 as La sagesse d'un peuple : 2000 proverbes Senoufo (link to description; this work is not yet available online).

Monday, October 10, 2016

"Wogbɛ Jɛkɛ" & Ghanaian language input support

Came across mention on Twitter of the Ghanaian play "Wogbɛ Jɛkɛ - A Tale of Two Men" but with the Ga words in the title written "Wogb3 j3k3":
In fact, looking at Twitter and at the web via a Google search, one notes both this workaround and the correct spelling, as well as the ASCIIfied version, "wogbe jeke."

7 vowels and a 5 vowel keyboard

Ga, a Ga-Dangme language of southernmost Ghana, has a complex vowel system, with seven vowels distinguished in its writing system: a; e; i; o; and u; plus ɛ ("open e") and ɔ ("open o"). The latter two are used to write many other African languages such as Akan, Ewe, Mende, Bambara, and Lingala.1 (These characters, like a number of other Latin letters, are also in the International Phonetic Alphabet.)

Many fonts include the ɛ and ɔ, however typing them is not facilitated by standard keyboards. There are keyboard layouts specially conceived for Ga (see below for a list), as well as for Akan, Ewe, and others. However, there apparently are not any keyboards to enable multilingual input - such as an Akan title included in a tweet in English. Or if there are, they are not widely used. Hence resort to "3" for "ɛ" and ")" (the right parentheses) for "ɔ."

In African Languages in a Digital Age (p. 61) I outlined several workarounds for text including extended Latin characters not supported in fonts or input systems, a summary that was a revision of something published a decade earlier.2 I had not, however, noted the use of numbers or symbols among the "substitution solutions." Ade Sawyerr, who has worked with Ga input issues, mentions observing these particular substitutions - "3" and ")" - as well as others, such as "rj" for the letter "ŋ" ("eng"), which is also used in Ga.

In any event, the resort in the mid-2010s to 3's and )'s to type words in languages like Ga, Akan, and Ewe that use them is evidence of missing input options on the devices used, or inconvenience of existing options, or perhaps lack of awareness of available keyboard apps on the part of users.

Some keyboard layouts for Ga

Over the last couple of decades, and especially since the availability of keyboard utilities like Keyman and Microsoft Keyboard Layout Creator (MSKLC), there have been many keyboard layouts developed for languages such as those of Ghana that have extended Latin orthographies. A full discussion is beyond this blog post, but generally speaking, keyboards incorporating characters not on the standard computer keyboards work either through changing key assignments (such as "q" is not used in Ga, so "ŋ" is substituted for it) or via a combination or sequence of key strokes. The solution with changed keys seems to be more common on mobile device applications, whereas both approaches are found in keyboard layouts used on computers.

Kasahorow Android keyboards
menu selection
A selection of Ga keyboards:
There likely are others for Ga (and the closely related Dangme). There definitely are a number for other languages of Ghana such as Akan (or its varieties, Twi Ashanti, Twi Akuapem, and Fante), Ewe, and Dagaare.

However, more could be done to facilitate multilingual typing, so that one doesn't have to switch keyboards or keep track of key sequences to insert something like Wogbɛ Jɛkɛ in an English tweet, or say a Hausa word with a hooked letter in a text in Akan (hooked letters are not part of the Akan orthography). Could for example an extra line of keys be added to touchscreen keyboards - say on a Ghana English keyboard - with the extra characters needed for Ghanaian languages?

About "Wogbɛ Jɛkɛ"

Wogbɛ jɛkɛ is a Ga term with meanings of "we have come from far" and "our journey is still long." It is used in the title of two plays written by Chief Abdul Moomen Muslim about the historical events, beginning with "Wogbɛ Jɛkɛ: Birth of a Nation," which depicts pre-colonial history of what is now Ghana, and followed by "Wogbɛ Jɛkɛ: The Tale of Two Men," which is centered around the stories of J.B. Danquah and Kwame Nkrumah during Ghana's independence struggle.

1. Some Nigerian languages like Yoruba and Igbo instead use sub-dotted characters - and - for these vowels.
2. Don Osborn, 2001, "The knotty problem of using African languages for e-mail and internet," Balancing Act News Update, 69.

Friday, September 30, 2016

Internationalizing computer science in Africa

Last year I posted on whether Unicode and internationalization (i18n) is included in any computer science curriculum in Africa. A recent comment to that post by Andre Schappo asking whether there are any organizations in Africa promoting internationalization of university curricula more generally offers another angle to approach this issue.

Part of Unicode charts for Ethiopic/Ge'ez
Andre's question follows a post on his blog about two organizations that promote internationalization of teaching curricula, one in the UK and the other in Australia. Depending on how one defines promotion of internationalization in higher education, one might add many other initiatives and consortia which seek in one way or another to develop and support international or global studies. The degree to which such efforts overlap with or might impact the content of computer science courses is an interesting question. In my limited experience, international/global studies mainly addresses disciplines in other areas (social sciences, humanities, certain applied disciplines). It certainly is worth asking how a program of internationalization at a university would apply to computer science and see how the discussion goes.

However, in the case of Africa - and also Asia - internationalization of the computer science curriculum would seem to follow as much from attention to localization as to international and global perspectives.

In any event, this issue of how Unicode and i18n figure in computer science instruction - worldwide as well as in Africa - is one that is important for technical and language planning reasons as well as for the same reasons that motivate attention to internationalization in the higher education generally.

Thursday, September 08, 2016

International Literacy Day: Let them write!

One of the most common objections I have heard from international development colleagues about literacy training in African languages is "What will they read?" While it is true that relatively little is published in some African languages, and next to nothing in others, such a view has problems on several levels. For example, it's easier to learn in one's first language, literacy skills in one language facilitate learning other languages, and there is a cultural cost to always and only associating formal learning with a Europhone second language. But one of the most important in my opinion, and one that I have offered as a primary defense of literacy in first languages of Africa, is that neo-literates* can write - maybe just a little, like a ledger, or maybe a lot, in stories that express and communicate in their own way.

So it is a pleasure to see the theme for this year's International Literacy Day (ILD; 8 September 2016): "Reading the Past, Writing the Future."

Are there examples of newly literate people in Africa writing in African languages? Yes of course. One is the Senegalese organization Associates in Research and Education for Development (ARED), which has actually published writing by its students. I have also heard of literacy students just writing with this new tool. There are certainly many more.

With the association of literacy with goals of "lifelong learning" - per the 2030 Agenda for Sustainable Development - there should be a way to support and encourage neo-literate writing in first languages on a wider and more systematic basis. Not just for fun, though hopefully at least that, but for adding many diverse voices to writing the future.

Additional notes

Two African organizations were recognized this year with the UNESCO Confucius Prize for Literacy (which along with the King Sejong Literacy Prize are awarded annually on ILD):
  • the South African Department of Basic Education’s ‘Kha Ri Gude Mass Literacy Campaign
  • the Direction de l’alphabétisation et des langues nationales in Senegal for its ‘National Education Programme for Illiterate Youth and Adults through ICTs
Both programs sound interesting. I'd like to know more about how the Senegalese program used its national languages (and which ones) in ICT.

For a very interesting discussion of ILD from Malawi, see Steve Sharra's blog, Afrika Aphukira: Literacy, Language and Power: Thoughts on International Literacy Day 2016

* "A neo literate is an individual who has completed a basic literacy training programme and has demonstrated the ability and willingness to continue to learn on his or her own using the skills and knowledge attained without the direct guidance of a literacy teacher." APPEAL - Training Materials for Continuing Education Personnel (ATLP-CE) - Volume 2: Post-Literacy Programmes (APEID - UNESCO, 1993, 112 p.)

Tuesday, September 06, 2016

VOA Hausa Digital Content Editor

The Voice of America (VOA) is hiring a Digital Content Editor for its Hausa service. Normally I do not post jobs on Beyond Niamey, but rather do so occasionally on the Facebook African languages group. In this case I am making an exception since it seems that the person hired by VOA will be in a position to possibly help the organization finally move its Hausa web content from an ASCIIfied version to the Boko orthography - a topic that has been discussed previously on this blog.

Links to the position announcement are below, but first a quick review of the issue. The Latin-based "Boko" alphabet for Hausa includes several modified letters (technically called "extended characters") that stand for sounds not represented in the alphabet as used in English, French or other European languages. Sometimes called "hooked letters" they include: ɓ ; ɗ ; ƙ ; and in Niger, ƴ - in Nigeria 'y is written for the same sound as the last one. The capital letter forms of the four hooked letters are Ɓ Ɗ Ƙ Ƴ.

When VOA and other international radio services - notably BBC, CRI, and RDW - began websites for their respective Hausa services, the Unicode standard that facilitates display of extended Latin characters and diverse writing systems on the internet, was not in widespread use (RFI added its Hausa service later). Evidently this was the reason for resort to an ASCIIfied rendering of Hausa text (with b, d, k, and y instead of the hooked characters, which can change meanings) - older systems then in use among the audience may not have been able to handle the Unicode-encoded hooked letters.

That argument is losing credence, if it is not already meaningless. The number of systems in use old enough not to have Unicode fonts (now the norm but the earliest of them were already in systems over a decade ago) must be very few. Moreover all the 5 international radio Hausa sites use UTF-8, which displays Unicode.

So what is the current state of use of the Boko orthography (with the hooked letters) on the five sites - VOA, BBC, CRI, RDW, and RFI? I used a new way of evaluating them - actually bringing back an old trick - which is to search just the letters on the sites with Google. The best way is to use Google advanced search, or just put a sequence like this in the search window of the usual Google page:

ƙ OR ɓ OR ɗ OR ƴ

This pulls up all pages on the site with at least one of these hooked letters. You can substitute the domain of the site you want to evaluate. My results were: BBC 16 pages; RDW 7 pages; VOA, CRI, and RFI all 0. Not impressive.

What's holding them back? Inertia? Lack of a keyboard layout to easily type with the hooked letters? Lack of a spell checker for Hausa in Boko orthography?

In any event, the new Digital Content Editor for the VOA Hausa service would be in a position to make a significant contribution to that service's web content, with secondary effects on other Hausa language websites.

The position has two listings on the site: one for US citizens; and one for non-US citizens. (This sort of dual listing is normal; you see it also sometimes for internal candidates in an agency and for external candidates applying from outside the agency.) The position was announced today, 9/6/16, and closes 9/20/16.

Saturday, September 03, 2016

Facebook, ISOC, and A12n

In his recent visit to Lagos, Nigeria, Facebook founder and CEO Mark Zuckerberg indicated that Facebook will add more African language interfaces. Meanwhile, at the African Peering and Interconnection Forum (AfPIF2016) in Dar es Salaam, Tanzania, the Internet Society (ISOC) released a report entitled "Promoting Content in Africa," which highlights the importance of internet content in African language for full access by Africans.

These two developments concerning on the one hand localization of the software for a popular social media platform, and on the other hand the creation of content, highlight the dual aspects of Africanization (A12n) of information and communication technology in/for Africa. As these processes develop, it would be useful for to find ways to integrate them as appropriate, and foster collaboration among organizations and individuals involved in either or both. (That was the intent of the African Network for Localisation, ANLoc, albeit with a focus mainly on the software and enabling aspects.)

It is possible, as the ISOC report notes, for content to be developed or translated in a language even when the software on which it is created is not localized in it. And that certainly would be the case for the less widely spoken languages, at least in the near term. However, the availability of software interfaces - whether for social media like Facebook or for production software - in at least the major African languages, would probably help even for the less-spoken ones.

Facebook sign-up in Hausa. (Source:
Facebook currently is available in the following African languages (links are to Wikipedia articles): Afrikaans; Arabic; Hausa; Kinyarwanda; Malagasy; Somali; Swahili; and Tamazight

One of the contributors to the ISOC report, Dawit Bekele, who is ISOC's African Bureau Director, was a participant in the PanAfrican Localisation Workshop in Casablanca, June 2005, and the Pan African Research on L10N Workshop & Localization Blitz in Marrakech, February 2007.

Wednesday, August 31, 2016

Missing "macrolanguages" of Africa

Screenshot from VOA's Kinyarwanda/Kirundi site
The Voice of America (VOA) recently had a job opening for "International Broadcaster (Multimedia) (Kirundi/Kinyarwanda)." Kirundi and Kinyarwanda are the mother tongues, national languages, and co-official languages in, respectively, Burundi and Rwanda. And they are mutually intelligible, with only minor differences, such that apparently a fluent speaker of either could work on a program serving speakers of both. But there is no term covering both - unless one counts the hyphenated Rwanda-Rundi - and no language coding category to cover material designed for use across the two.

This is a situation encountered with many languages in Africa, and one for which there is at least one potential solution - the neologism and language coding category "macrolanguage." There are actually some macrolanguages defined in Africa, but these are few, and as I discuss below, kind of accidental. Is it time to systematically identify (and code) macrolanguages in Africa?

What defines a language?

For most of us, the distinction between languages seems pretty straightforward. But beyond the most spoken international languages - those used officially by the United Nations or ones you are likely to see on a school curriculum - the situation is often more complex. Sometimes two or more closely related languages are so similar that their speakers can understand each other, but sometimes variations within one language can make understanding difficult. An earlier posting on this blog looked at the notion of "neighbor languages" in Scandinavia and Africa. A broader consideration of these issues by Columbia University's John McWhorter suggests that we're really all speaking dialects, some of which benefit from written forms, and one might add, status, resources, and policy support. There is some truth to the saying that "A language is a dialect with an army and a navy."

However, the issues of what to call a "language" and where to draw the boundaries between it and another "language" are still of practical importance for communication (standardization, references, ICT use) and planning (government, business, education). There are two broad approaches in linguistics to doing this, corresponding with the splitter/lumper (or joiner) approaches to categorizing:  one focusing more on distinctions, and the other focusing more on commonalities.

Without going too deeply into that discussion, which gets more complicated when accounting for issues of identity, names, written forms, and national boundaries, suffice it to say that in considering African languages, there are many situations where one encounters the splitter/lumper choice.

The major reference of languages in the world, Ethnologue, takes a more splitter approach, which means that speech varieties that are closely related and interintelligible may be classified as separate languages. It is their estimate of the number of language in Africa (over 2000) that is most commonly cited, but there are other more conservative estimates.A good academic discussion of this issue entitled "How many languages are there in Africa?" was published in 2004 by Jouni Filip Maho (his estimate is under 1500).

What is a "macrolanguage"?

To make the story brief, the term "macrolanguage" is not a term that was used in linguistic description before the inauguration of the  ISO 639-3 system for encoding all languages in the late 2000s. Since that system is based on Ethnologue's "splitter" data, a new category was needed to accommodate existing codes in the earlier less comprehensive parts of ISO 639 (1&2) that in many cases were more "lumper" in approach. The term macrolanguage was in effect a "shim," to borrow someone else's term, to fit the two systems together.

There are by my count 14 macrolanguages listed for Africa (names linked to the Ethnologue macrolanguage pages): Akan; Arabic; Dinka; Fulah; Gbaya; Grebo; Kalenjin; Kanuri; Kongo; Kpelle; Malagasy; Mandingo; Oromo; and Swahili. There could be others.

That brings us back to Kinyarwanda and Kirundi. How is the relationship between them different - more distant - than any of the above established macrolanguages? One difference, as mentioned above, is no common name to make it easy, and another is that they are dominant in different countries - perhaps analogous to the situation of Scandinavian languages?

Another curious situation is that of Mandingo, which includes several western Manding languages, but not Bambara and Jula (Dyula). Even if the latter two were considered too different from the other Manding tongues, they are close enough that one could localize software for the two together. Keep in mind also that the emerging literary standard N'Ko covers all Manding languages (in a different alphabet). Should the Mandingo macrolanguage be extended to include them all?

The four languages of southwestern Uganda - Kiga, Nkore, Nyoro, ajd Tooro - are close enough to be covered by Runyakitara, a proposed (but not encoded) standard which is being used in various ways, including at least some teaching and a localization of the Google interface. Should these four be considered a macrolanguage under perhaps that same name, thus finally providing a code for localization in Runyakitara?

And there are other examples around the continent that could be discussed.

What good would more macrolanguages do?

The first benefit of identifying more macrolanguages would be in language coding - the very environment in which the term was first used. The language of VOA's website for its Kinyarwanda/Kirundi service - - is coded as "rw" (Kinyarwanda) since there is no macrolanguage code covering both languages. Likewise, in many cases, the grouping of very close and mutually intelligible languages as a macrolanguage could facilitate localization of software and apps to serve larger populations - and those larger markets could make it more likely that such localization would be pursued and maintained.

Another benefit would be to complement the tendency in language coding towards seeking more granularity, by recognizing natural groupings of languages (for more on this, see a message to the IETF-languages list last May). In effect providing more balance between splitting and lumping/joining.

In the broader picture, identifying macrolanguages could have benefits for policymaking and program development involving languages within macrolanguage groups, by calling attention to the closely related languages. Especially where foreigners are involved, projects may overlook such relationships and the potential resources they may provide. For example materials development for education, and various communication needs might benefit from tapping efforts and resources in closely related languages.

(Minor edits and image added, 2 Sep. 2016)

Sunday, June 19, 2016

TED talks in African languages?

Of all the TED and TEDx talks - a genre of knowledge sharing that began in the 1980s but went "viral" with the possibilities offered by YouTube - have any been given in any African language? The question is not so easy to answer as I'll get to below, but the process of trying to answer it gives rise to other questions such as: Could a TED talk or a TEDx event be given in one or several African languages?

Image source:

TED - "Ideas Worth Spreading"

TED, an acronym for Technology, Entertainment, Design, "is a global set of conferences run by the private nonprofit organization, Sapling Foundation." The idea of the conferences is sharing of ideas "usually in the form of short, powerful talks (18 minutes or less)."

The conferences have been held mainly in North America and Europe, with a handful in Asia and Latin America. One, in 2007, was held in Arusha, Tanzania with the theme, "Africa: The Next Chapter." Many, but not all, of the talks in these events become videos featured online.

The talks, which total some "2200+" according to the website, are apparently all given in English. (The program for the 2007 conference in Arusha is not available online to check.) Quite a number of talks are subtitled in other languages, as I'll discuss further on.

TEDx - "x = independently organized event"

Image adapted from:
TEDx events, of which there are several types, are licensed by TED but organized separately. The number of TEDx events around the world is not stated anywhere I looked, but one list includes 2967 events (number from the line count in my text editor), and a nice interactive map display includes some past events that are not on that list (I randomly checked some in Africa).

The total number of talks at these independent conferences must therefore be staggering. The drop-down list in the sidebar of the TEDx languages page lists 43 languages, of which the only African one is Arabic (to that extent, my first question in the opening paragraph above would be answered in the affirmative). However, given the large number of TEDxs that have been held in many diverse locations around the world, is it possible that there have been presentations in other languages not on that list?

From a rough count of TEDx events in Africa in 2015 on the map mentioned above, there were ~80 events, with well over half in diverse locations in sub-Saharan Africa. Were presentations in places like for example Kano, Nigeria, Dar es Salaam, Tanzania, and Addis Ababa, Ethiopia all English-only?

Subtitling of TED talks

According to the translation page on the TED site - there has been subtitling of talks in over 100 languages (the actual count on the page is 110, thanks again to copy-paste & line-count, but that number includes some varieties of the same languages, as well as English originals). The African languages among these, with their count of how many talks, include: Afrikaans (19); Amharic (13); Arabic (2091); Arabic, Algerian (9); Hausa (1); Igbo (1); Somali (20); and Swahili (33).

The one talk (in English) with Hausa subtitles - embedded below - was given in 2003 and with the subtitles evidently added in 2008. Worth noting that the Boko orthography is used, as you can see with the hooked consonants.

The one talk with Igbo subtitles does not appear to follow the standard orthography - the lack of subdot vowels is one giveaway, but also tone marks are absent. And there are untranslated English terms - the first instance I recall seeing of code-mixing in subtitles. The other language subtitles look polished, though I'm even less in the position to evaluate them.

TEDx talks, as noted above, come in various languages, and apparently some of them have same-language subtitling, although that term is not used (for example several dozen in French).

The translation/subtitling effort itself looks like a successful involvement of volunteer contributions for at least a number of languages.

TED or TEDx in African languages?

There are two ways to achieve more linguistic diversity relevant to Africa in TED talks. The first would be through expanding the translation program mentioned above.This might require some new approaches as the volunteer model may not work as well as in Northern countries. The benefit would be expanding access, particularly with some more widely spoken African languages.

The second would be to organize (more?) TEDx events that either allow presentations in African languages, or that explicitly invite presentations in one or more African language(s). This would seem to be an interesting way to bring in diverse presenters, and to develop recorded content that could be shared locally, nationally, or regionally (depending on the language demographics). Even for those without internet or mobile access to such TEDx recordings, it might be possible in some contexts to distribute video for TV and audio-only for national and community radio. And such content could of course be translated into other languages for wider dissemination.

Ideas for sharing, after all, can come in many languages.

Friday, June 17, 2016


This is the fourth in a string of posts on conferences and workshops relevant to, or specifically addressing, African languages. Only one event of all of those mentioned, however, is in Africa. More on that at the end of the post, but first the three upcoming conferences for which there are active CFPs (calls for papers/participation). The subject of the first, LESEWA, is on similarities between a number of West African and East Asian languages - a theme that has long interested me as a learner of Bambara and Chinese (Mandarin). The latter two deal with a broad set of languages of generally disadvantaged status and fewer speakers, among which many African languages can be counted. The first two events are the latest of long-running conference series; the third is brand new.


The International Conference on Languages of Far East, Southeast Asia and West Africa (LESEWA) will be held in Moscow, Russia on 16-17 November 2016. This is the latest in a series of biennial conferences that began in 1990 (I am told that the idea began with Prof. Vadim Kasevich and colleagues). The CFP deadline is 1 July 2016.

LESEWA "will focus on the remarkable far-reaching parallelism in syntactic and semantic structures of the languages of the Far East, Southeast Asia and West Africa, which can be explained neither by genealogical affinity, nor by aerial factors. Both individual language investigations and typological studies are encouraged. General phonetic and general linguistics themes are especially welcome."


FEL XX Hyderabad (the 20th conference of the Foundation for Endangered Languages) will be held at the University of Hyderabad in India on 2-5 December 2016. Its theme is "Language colonization and endangerment: Long-term effects, echoes and reactions." The CFP deadline has been moved back to 1 July 2016 (the conference date was also changed).

FEL XX "aims to examine language endangerment during the colonial era, and the impact of colonization on the subsequent efforts of the independent nations and communities to revitalize their language heritage. The conference will look at continuity and change in approaches to language use." The concept of "colonialism" is broad, including not only expansion of European rule, but also historically earlier periods of domination by one people over others.


The First International Conference on Revitalization of Indigenous and Minoritized Languages will be held in Barcelona, Spain, on 19-21 April 2017. It is co-sponsored by the Universitat de Barcelona, Universitat de Vic-Universitat Central de Catalunya, and Indiana University-Bloomington. The deadline for proposals is 30 July 2016.

"The mission of the conference is to bring together instructors, practitioners, activists, Indigenous leaders, scholars and learners who speak and study these [indigenous and minoritized] languages. This international conference includes research, pedagogy and practice about the diverse languages and cultures of Indigenous and minoritized populations worldwide."

Language conferences and Africa

As noted above, of the nine events spotlighted in this and the previous three posts, all but one are outside of Africa (that one is in South Africa). To be fair, not all of them deal specifically with African languages. But in general one may fairly ask how many conferences on languages and linguistics - be they Africa focused or global in scope - take place in Africa, one of the most multilingual continents. No clear answer to offer here, but if one were to do a count, it might help to go about the task with attention to types of conferences - academic vs. policy vs. workshop-type - and to the subjects - general or focused on Africa. I have the impression that quite a number of events - conferences, expert meetings, etc. - dealing with policy and practical aspects of African language use have been held in various parts of Africa, as one would expect. On the other hand, academic conferences, whatever the topic - even African languages and linguistics - are more frequent in the Northern countries due to the number of institutions and scholars, and resources available to them for convening such events. General conferences on topics like ICT4D or endangered languages might be located anywhere, and conference series with significant African content and potential participation do seem to try to alternate regional locations to include some in Africa.

All of which is to say that my unscientific sampling of nine events may not tell us much about the choices of location of conferences on or relating to African languages. Nevertheless, it seemed worth addressing the topic given the apparent discrepancy in geographic representation.

Friday, June 10, 2016

Upcoming events: Bantu 6, Borderland Linguistics, LSSA/SAALA/SAALT, and TripleA 3

Having spotlighted ICTD 2016 last week and the upcoming TALAf 2016 workshop, here are three four more conferences taking place over the next few weeks whose subjects are directly or indirectly relevant to African languages.

Bantu 6

The 6th International Conference on Bantu Languages, 20-22 June 2016, "brings together specialists in all aspects of the study of Bantu languages." It is being organized by the University of Helsinki in Finland with several partners and sponsors. The provisional program and abstracts are available on the conference site.

The series of linguistic conferences of which this event is a part considers the branch of the Niger-Congo language family known as Bantu. Bantu languages are spoken in large parts of Southern and Central Africa, as well as in East Africa.

The series, which has involved many prominent international scholars in African languages and linguistics, goes back several years with conferences in various locations in Europe (this incomplete list gleaned from several sources):
  • (First)
  • Bantu Languages: Analysis, Description and Theory, 4-7 October 2007, University of Götenborg, Sweden
  • Bantu 3, 25-27 March 2009, Tervuren, Belgium
  • "B4ntu," 7-9 April 2011, Berlin, Germany (Bantu 4 was originally scheduled for 22-26 March 2010 at Lancaster University, UK, but had to be postponed)
  • Bantu 5, 12-15 June 2013, INALCO, Paris, France

Borderland Linguistics Conference

The Borderland Linguistics Conference will be held on 27-28 June 2016 at the University of Bristol, UK. This is not specifically related to Africa, however, the program includes three presentations on languages in Africa. Also, given the attention in this blog to "cross-border languages" in Africa, it seems especially appropriate to mention this event.

The conference theme is described this way:
The notion of border is highly complex and problematic, whether it be an officially demarcated border between two states, or a less rigorously defined meeting space of somehow differentiated social or ethnic groups. Leading theorists have proposed that a broad-reaching 'theory' of borders may in fact be infelicitous, due to the contextual specificities of each different border area that may constitute an area of study. Nevertheless, borders remain fruitful sites for scholarly inquiry, and this conference invites contributions from linguistics researchers of all levels whose work focuses on borderlands.


The LSSA / SAALA / SAALT Joint Annual Conference for 2016 will be held at the University of the Western Cape in Bellville, South Africa on 4-7 July 2016.  The three organizations running the conference are: Linguistics Society of Southern Africa; Southern African Applied Linguistics Association; and South African Association for Language Teaching.

The conference theme - "Language and Linguistics in the Global South: Posing the Challenge" - is framed "within the current context of demands for radical changes to academic content and access at our universities" and encouraged contributors to address "issues of decoloniality and southern theory in linguistic research and teaching." The topics of the conference include: applied linguistics; language practice; language teaching; linguistics; sign language; sociolinguistics; multilingualism; discourse analysis; and linguistic landscapes.

TripleA 3

The Semantics of African, Asian and Austronesian Languages (TripleA 3), 6-8 July 2016, Tübingen, Germany, is the third in a "workshop series aims at providing a forum for semanticists doing fieldwork on understudied languages. Its focus is on languages from Africa, Asia, Australia and Oceania."

Semantics is a branch of linguistics concerned with the study of meaning. The TripleA 3 program includes a number of presentations on African languages.


The attentive reader will notice that three of thee four events or series take place in Europe. This is partly a function of chance in the time period chosen, although it is true that Northern institutions have the resources to sponsor such meetings.

Normally it is more useful to post the calls for participation/papers (CFPs), but these are published regularly on relevant sites including Linguist List. This blog is not intended as a reliable source for such news, but will hopefully continue to carry information about interesting meetings and events relating to African languages and the information society. (That said, an upcoming post will feature two CFPs that may be of interest.)

(The section on the Borderland Linguistics Conference was updated on 14 June 2016 with information provided by its organizer, Dr. James Hawkey. Information on the 2016 LSSA / SAALA / SAALT Joint Annual Conference was added on 17 June 2016.)

Tuesday, June 07, 2016

Des infos sur l'atelier TALAf 2016

Voici quelques informations sur l'atelier TALAf (Traitement automatique des langues africaines) qui aura lieu le 4 juillet 2016 lors de la conférence JEP-TALN-RECITAL à Paris, France. (For English, see TALAf workshop.)

Il y a dix articles acceptés pour présentation à l'atelier : 8 en français, 2 en anglais. En tout, huit langues africaines figurent dans les sujets de ces articles : amazighe, bambara, comorien, igbo, maninka, peul, swahili, et wolof. Le programme suit :

09h30-10h00  Valentin Vydrin, Andrij Rovenchak & Kirill Maslinsky
Maninka Reference Corpus: A Presentation.
10h00-10h30  Ikechukwu Onyenwe, Mark Hepple & Uchechukwu Chinedu
Improving Accuracy of Igbo Corpus Annotation Using Morphological Reconstruction and Transformation-Based Learning.
10h30-11h00Pause café
11h00-11h30Moneim Abdourahamane, Christian Boitet, Valérie Bellynck, Lingxiao Wang & Hervé Blanchon
Construction d’un corpus parallèle français-comorien en utilisant de la TA français-swahili.
11h30-12h00David Blachon, Elodie Gauthier, Laurent Besacier, Guy-Noël Kouarata, Martine Adda-Decker & Annie Rialland
Collecte de parole pour l'étude des langues peu dotées ou en danger avec l'application mobile Lig-Aikuma.
12h00-14h00Pause repas
14h00-14h30Michael Melese Woldeyohannis, Laurent Besacier & Meshesha Million
Amharic Speech Recognition for Speech Translation.
14h30-15h00El Hadji Malick Fall, El Hadji Mamadou Nguer, Sokhna Bao Diop, Mouhamadou Khoulé, Mathieu Mangeot & Mame Thierno Cissé
Digraphie des langues ouest africaines : Latin2Ajami : un algorithme de translittération automatique.
15h00-15h30Fatimazahra Nejme, Siham Boulaknadel & Driss Aboutajdine
Développement de ressources pour la langue amazighe : Le Lexique Morphologique El-AmaLex.
15h30-16h00Alla Lo, Elhadji Mamadou Nguer, Abdoulaye Youssoupha Ndiaye, Cheikh Bamba Dione, Mathieu Mangeot, Mouhamadou Khoule, Sokhna Bao Diop & Mame Thierno Cisse
Correction orthographique pour la langue wolof : état de l'art et perspectives.
16h00-16h30Pause café
16h30-17h00Mouhamdou Khoule, Mathieu Mangeot, El Hadji Mamadou Nguer & Mame Thierno Cisse
iBaatukaay : un projet de base lexicale multilingue contributive sur le web à structure pivot pour les langues africaines notamment sénégalaises.
17h00-17h30Chérif Mbodj & Chantal Enguehard
Production et mise en ligne d’un dictionnaire électronique du wolof.

Les ateliers TALAf ont lieu tous les deux ans depuis 2012. Ils sont soutenus par le réseau Lexicologie, Terminologie, Traduction, une association internationale qui faisait partie de l'Agence universitaire de la Francophonie (AUF) jusqu’en 2010.

Selon le site web de TALAf, les rôles de l'atelier sont les suivants :
  • "mettre en relation les chercheurs du domaine grâce aux rencontres lors de l'atelier mais aussi avec la liste de diffusion ;
  • mutualiser les savoirs en utilisant des outils en source ouverte, des standards (ISO, Unicode), et en publiant les ressources produites sous licence ouverte (Creative Commons), afin d'éviter, entre autres, la perte d'informations lorsqu'un projet s'arrête et ne peut être repris immédiatement faute de moyens ;
  • développer un ensemble de bonnes pratiques fondées sur l'expérience des chercheurs du domaine. Il s'agit de mettre au point des méthodologies simples et économes en coût d'achat de logiciels pour l'élaboration de ressources, d'échanger sur les techniques permettant de se passer de certaines ressources inexistantes et enfin d'éviter des pertes de temps et d'énergie."