Wednesday, December 30, 2015

Unicode in African computer science curricula?

Unicode Consortium logo
A current thread on the Unicode email list asks about Unicode in the computer science curricula of universities or other schools - on the premise that most students in this area learn little to nothing about it. The thread also touches on the broader level of familiarity of computer technicians with internationalization ("i18n").

I'd like to relay this question in the direction of information and communications technology (ICT) experts in Africa: Are there examples of Unicode being taught as part of courses in any African universities? Or of computer science programs that do not include Unicode or internationalization in any courses?

For those not familiar with Unicode (and the ISO/IEC 10646 universal character set), it is the character encoding standard that permits use of any or all writing systems on computers and across the internet. (For more, see the Unicode Consortium's page, "What is Unicode?")

Relevance of Unicode in Africa


Given the use of extended Latin characters in written forms of many languages in Africa, and the use of non-Latin scripts for others, it would seem that Unicode would be a natural subject to introduce to students who plan to work in ICT in Africa. So perhaps there are some good examples to share?

For instance, last September there was news that computer science students at American University of Nigeria (AUN) developed phone apps in Hausa and Fulfulde for teaching literacy. This requires some knowledge of Unicode, implying that perhaps the students learned about it at AUN. (I'm seeking more info on this for an upcoming post in the "Really smart mobiles know African languages" series.)

On the other hand, the persistence of an 8-bit "special font" in Mali would seem to indicate that in that country at least, the word about Unicode hasn't spread to people who should know about it. Are there other examples of Unicode not being used where it would help most?

(See also: "Unicode and the architecture of ICT," 30 June 2015)

Friday, December 25, 2015

List of African languages on iPhone6s

iPhone 6s+, part of "Other languages" menu
Pursuant to mention in the previous post about support for African languages on Apple's iPhone6s+ (iOS 9), I wanted to take a moment to list what those languages are. Again, there is not full support for any of them as far as I checked, but the list seems to be the most extensive list on any smartphone / mobile device available.

But first, how prominently do iPhones figure in the rapidly expanding use of mobile devices in Africa? According to a 2010 Foreign Policy article critiquing Apple's not marketing iPhones in Africa, "for the vast majority of Africans, Apple effectively doesn’t exist" (a statement that in my impression has long been true even in sectors with access to ICT). That picture is changing, at least in some measure. In a March 2014 article on "unauthorized" iPhones in Nigeria, IT News Africa, stated that "Apple’s iPhone is one of the most desirable brands in most parts of Africa." And Apple itself, in its page on "Wireless carrier support and features for iPhone in Africa," lists carriers for 36 of Africa's 54 countries.*

Given the high cost of iPhones, however, even the less-expensive models are still upmarket items (see a discussion of relative cost per income of iPhone 5c in China, India, and Africa). So the benefits of Apple's apparent commitment to localization in African languages will not accrue directly to most people, though it may help raise the bar for other systems.

Here is the list of African languages on iPhone 6s+, 74 in total (extracted manually from the full list of ~240 "Other languages"; names linked mainly to Wikipedia articles; any errors or omissions are mine; see also notes below list):

Notes to above list (numbering for convenience only):
  1. English names as used on the iPhone were retained for this list; in a few cases, additional names have been added for clarity.
  2. Listings for the two languages that are offered each in two different scripts were consolidated (Arabic & Latin for Soninke, and Latin & Vai for Vai).
  3. Notice different approach to Songhay language(s) in Firefox OS (previous posting), which is localized for "Songhay," and iPhone, which follows Ethnologue and ISO 639-3 listing "Koyra Chiini" and "Koyraboro Senni" separately (as well as "Zarma"). Would be interesting to know how these efforts compare.
  4. I did not find some major languages on the list, such as Amharic, Hausa, Kongo, and Tigrinya.
  5. The languages that are in the list include some that are very widely spoken and/or official, and some that are less-widely spoken - this may be a function of response to Apple's encouragement of developers to localize more apps rather than a planned approach (seeking more information).

* The 18 countries not on the list (which had last been updated on 6 Nov. 2015 at time of access) are: Benin, Burundi, Cape Verde, Comoros, Djibouti, Equatorial Guinea, Eritrea, Ethiopia, Gambia, Lesotho, Liberia, Mauritania, Sao Tome e Principe, Seychelles, Somalia, South Sudan, Sudan, and Swaziland.

Thursday, December 24, 2015

Really smart mobiles know African languages, Part 1

Support for African languages on mobile devices (cellphones, smartphones) is quietly getting better. Here's a quick look at some dimensions of that process. This is a big topic - one with many products (devices, apps) and initiatives, in the context of rapidly expanding access to mobile devices on the continent. So in this and two following posts I will simply try to highlight some aspects with the expectation of returning to the topic and the hope of feedback about accuracy and completeness.

Potential support for a language in a device has several different aspects.With English and other major international languages we have them all, and as a single package. But that's not the case with many languages, especially "less-resourced" ones. So it's helpful to begin considering different ways a device may or may not support use of a language. For simplicity one might start with three ways in which languages are supported in mobile devices:
  • fully localized interface - where the menus are in the language; we might call this the "medium of interaction"
  • input systems (where some letters of the alphabet or an entire writing system are not supported by the standard Western keyboards), which permit output in the languages
  • apps and content about African languages (dictionaries, learning apps) - what might be called "subject of interaction"
This and two following posts will look at each of these aspects in turn.
Lumia 830, chosen languages menu in Hausa

Localized interfaces

A couple of months ago I took a look at a Lumia 830 phone with menus in Hausa (pictured), noting use of the Boko orthography (not ASCIIfied). Checked back recently and the only other African languages were still Afrikaans, Arabic, and Swahili. This line of phones uses the Windows Phone mobile operating system, so presumably other mobile products using the same system have similar language interface options.*

The current version of Android - 6.0 "Marshmallow" - boasts "74+ languages." If there is a list of these, it is hard to find, but the reality might be more complicated judging by a random look at some phones running Android. Some new Samsung phones featured around a dozen language options, none from Africa; and a Samsung Galaxy S 6 edge+ had about 60 but none from Africa other than Arabic. A Blackberry PRIV phone had close 150, with some African languages included.  These higher numbers (both with Android 5.x) include multiple locales for some languages (indeed, the latest list found online was one from 2012 showing 57 "languages and locales supported by Android," but 37 languages - it included no African languages apart from Arabic for Egypt).
Blackberry/Android PRIV, part of
language selection menu

Mozilla Firefox OS has an impressive number of localization projects. However, as is the case with Android, it is hard to find a simple list of languages supported, and by what version (a set of older lists is is available on Wikipedia). The list of African language localization projects for Firefox OS among the overall list include (links are to "team" pages; for other African language teams, see below**): Afrikaans; Amharic; Arabic; Bambara; Ewe; Fulah; Hausa; Igbo; Lingala; Luganda; Malagasy; Songhay; Swahili; Tswana; Wolof; Xhosa; Yoruba; and Zulu. Unfortunately Mozilla recently pulled back on its efforts to promote Firefox OS due to concerns with quality of user experience they were offering (though a Hong Kong company subsequently indicated it will continue development of the system).

Apple's current iPhones show an incredible list of "Other languages" including 74 from Africa by my count (two views of parts of the long language selection menu below give an idea; see next posting for a list of those languages). However, when I asked about them in an Apple store, I was told these were not yet supported, although there might in some cases be third party apps offering support. On closer look with the Fula/Pulaar option, I noted that the calendar had Pulaar abbreviations for months, but that everything else was in the default - in this case, English. In other words, the "other languages" may apparently have some, but not complete, support. The only African language in the shorter list of fully-supported languages is Arabic (see specs).
iPhone 6s, shots of "Other languages" selection menu

On the whole, it seems like there is significant progress for interfaces in some major African languages on mobile devices, and potential for more. Interesting to note some less-widely spoken African languages listed in the iOS language selection - would like to know more about the thinking there. Other questions include how the volunteer driven open source model will be able to keep up let alone expand language offerings, and what level of commitment the commercial operations have over the long term.

What level of communication and collaboration is there among people working on the same language? This would be very important, I think, for long-term success of efforts in less-resourced languages, though competitive models mitigate against it.

Also, is anyone researching how these various African language interfaces are being used, and what sort of feedback there is from users?

The next post in this series will look at input systems for African languages on mobile devices.

* Noting that Windows 10 has, in addition to the 4 African languages seen on Lumia phones (links to Wikipedia articles): Amharic; Fula; Igbo; Kinyarwanda; Northern Sotho; Setswana; Tamazight (both Latin & Tifinagh); Wolof; Xhosa; Yoruba; and Zulu. Is it just a matter of time before these get to the Windows phones?
** Mozilla language teams not working on Firefox OS are (links to team pages): Acholi, Akan, Kinyarwanda; Ndebele (South); Northern Sotho; Siswati; Southern Sotho; Tsonga; and Venda. Here too, one might ask what are the possibilities of moving on to mobile localization.

Wednesday, December 16, 2015

More on US Census Bureau & African languages

Last July I posted on the U.S. Census Bureau's coverage of African languages spoken in the United States. That focused on names and categories used (which I understand will be reviewed for possible revision), and included a map from Slate based on the Bureau's data showing the most spoken African languages or categories by State.

In October, the Census Bureau released its detailed data on over 300 languages (and language categories) spoken in the U.S. A summary table of this data was featured in an article written by Nikhil Sonnad last month on Quartz (and on CityLab under a different title). There is a small error in the text of that article, where it mentions "Sudanese" (spelled like the nationality) as a language - that should actually be "Sundanese" (spelled correctly in the table), which is spoken in Indonesia.

Below is a table with information excerpted from the Census Bureau's data, showing the numbers for the African languages discussed in my previous post, sorted by number of speakers. I have added Krio and Pidgin, which were omitted from that post. "African" stands for "African (not further specified)" in the Bureau's list of languages. The total number of speakers of African languages as defined by the Bureau - all of the below except Afrikaans, Arabic (which of course is spoken in Southwest Asia as well as North Africa), Krio, Malagasy, and Pidgin - is 894,499.



Number of speakers1
Margin of Error2
Speak English less than "Very Well"1
Margin of Error2
Arabic
924,374
13,743
341,425
5,888
Kru, Ibo, Yoruba
322,255
7,681
64,690
2,487
Amharic
195,260
6,368
81,385
3,479
Cushite
122,445
4,437
59,495
2,817
Swahili
88,685
3,414
22,055
1,913
Bantu
56,685
2,641
16,635
1,574
Fulani
30,475
2,022
11,745
1,193
Mande
29,835
2,461
10,370
1,171
Afrikaans
23,010
1,525
1,885
318
African
12,320
1,508
5,000
997
Krio
10,560
1,240
2,820
718
Chadic
8,565
991
2,275
426
Sudanic
8,510
1,317
3,935
710
Nilotic
6,890
1,184
2,235
490
Efik
5,620
775
905
305
Pidgin
4,445
636
1,100
352
Berber
2,940
756
1,630
472
Gur
1,310
529
405
272
Nilo-Hamitic
1,275
644
575
327
Malagasy
720
231
225
101
Mbum (and related)
715
353
370
269
Nubian
305
234
185
175
Nilo-Saharan
270
183
155
127
Saharan
80
95
(D)
(D)
Khoisan
55
89
20
32

Notes:
1. Detailed-language estimates are rounded to the nearest multiple of five. Aggregate estimates (in this selection from the original, only Arabic) are unrounded and appear in table B16001 (http://factfinder.census.gov/bkmk/table/1.0/en/ACS/13_5YR/B16001/0100000US). Detailed-language estimates may not sum to aggregate estimates because of rounding.
2. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data at http://www2.census.gov/programs-surveys/acs/tech_docs/accuracy/MultiyearACSAccuracyofData2013.pdf). The effect of nonsampling error is not represented in these tables.

Source: 
U.S. Census Bureau, 2009-2013 American Community Survey, Table 1. Detailed Languages Spoken at Home and Ability to Speak English for the Population 5 Years and Over for United States:  2009-2013. Release Date: October 2015. http://www2.census.gov/library/data/tables/2008/demo/language-use/2009-2013-acs-lang-tables-nation.xls