Wednesday, December 30, 2015

Unicode in African computer science curricula?

Unicode Consortium logo
A current thread on the Unicode email list asks about Unicode in the computer science curricula of universities or other schools - on the premise that most students in this area learn little to nothing about it. The thread also touches on the broader level of familiarity of computer technicians with internationalization ("i18n").

I'd like to relay this question in the direction of information and communications technology (ICT) experts in Africa: Are there examples of Unicode being taught as part of courses in any African universities? Or of computer science programs that do not include Unicode or internationalization in any courses?

For those not familiar with Unicode (and the ISO/IEC 10646 universal character set), it is the character encoding standard that permits use of any or all writing systems on computers and across the internet. (For more, see the Unicode Consortium's page, "What is Unicode?")

Relevance of Unicode in Africa


Given the use of extended Latin characters in written forms of many languages in Africa, and the use of non-Latin scripts for others, it would seem that Unicode would be a natural subject to introduce to students who plan to work in ICT in Africa. So perhaps there are some good examples to share?

For instance, last September there was news that computer science students at American University of Nigeria (AUN) developed phone apps in Hausa and Fulfulde for teaching literacy. This requires some knowledge of Unicode, implying that perhaps the students learned about it at AUN. (I'm seeking more info on this for an upcoming post in the "Really smart mobiles know African languages" series.)

On the other hand, the persistence of an 8-bit "special font" in Mali would seem to indicate that in that country at least, the word about Unicode hasn't spread to people who should know about it. Are there other examples of Unicode not being used where it would help most?

(See also: "Unicode and the architecture of ICT," 30 June 2015)

Friday, December 25, 2015

List of African languages on iPhone6s

iPhone 6s+, part of "Other languages" menu
Pursuant to mention in the previous post about support for African languages on Apple's iPhone6s+ (iOS 9), I wanted to take a moment to list what those languages are. Again, there is not full support for any of them as far as I checked, but the list seems to be the most extensive list on any smartphone / mobile device available.

But first, how prominently do iPhones figure in the rapidly expanding use of mobile devices in Africa? According to a 2010 Foreign Policy article critiquing Apple's not marketing iPhones in Africa, "for the vast majority of Africans, Apple effectively doesn’t exist" (a statement that in my impression has long been true even in sectors with access to ICT). That picture is changing, at least in some measure. In a March 2014 article on "unauthorized" iPhones in Nigeria, IT News Africa, stated that "Apple’s iPhone is one of the most desirable brands in most parts of Africa." And Apple itself, in its page on "Wireless carrier support and features for iPhone in Africa," lists carriers for 36 of Africa's 54 countries.*

Given the high cost of iPhones, however, even the less-expensive models are still upmarket items (see a discussion of relative cost per income of iPhone 5c in China, India, and Africa). So the benefits of Apple's apparent commitment to localization in African languages will not accrue directly to most people, though it may help raise the bar for other systems.

Here is the list of African languages on iPhone 6s+, 74 in total (extracted manually from the full list of ~240 "Other languages"; names linked mainly to Wikipedia articles; any errors or omissions are mine; see also notes below list):

Notes to above list (numbering for convenience only):
  1. English names as used on the iPhone were retained for this list; in a few cases, additional names have been added for clarity.
  2. Listings for the two languages that are offered each in two different scripts were consolidated (Arabic & Latin for Soninke, and Latin & Vai for Vai).
  3. Notice different approach to Songhay language(s) in Firefox OS (previous posting), which is localized for "Songhay," and iPhone, which follows Ethnologue and ISO 639-3 listing "Koyra Chiini" and "Koyraboro Senni" separately (as well as "Zarma"). Would be interesting to know how these efforts compare.
  4. I did not find some major languages on the list, such as Amharic, Hausa, Kongo, and Tigrinya.
  5. The languages that are in the list include some that are very widely spoken and/or official, and some that are less-widely spoken - this may be a function of response to Apple's encouragement of developers to localize more apps rather than a planned approach (seeking more information).

* The 18 countries not on the list (which had last been updated on 6 Nov. 2015 at time of access) are: Benin, Burundi, Cape Verde, Comoros, Djibouti, Equatorial Guinea, Eritrea, Ethiopia, Gambia, Lesotho, Liberia, Mauritania, Sao Tome e Principe, Seychelles, Somalia, South Sudan, Sudan, and Swaziland.

Wednesday, December 16, 2015

More on US Census Bureau & African languages

Last July I posted on the U.S. Census Bureau's coverage of African languages spoken in the United States. That focused on names and categories used (which I understand will be reviewed for possible revision), and included a map from Slate based on the Bureau's data showing the most spoken African languages or categories by State.

In October, the Census Bureau released its detailed data on over 300 languages (and language categories) spoken in the U.S. A summary table of this data was featured in an article written by Nikhil Sonnad last month on Quartz (and on CityLab under a different title). There is a small error in the text of that article, where it mentions "Sudanese" (spelled like the nationality) as a language - that should actually be "Sundanese" (spelled correctly in the table), which is spoken in Indonesia.

Below is a table with information excerpted from the Census Bureau's data, showing the numbers for the African languages discussed in my previous post, sorted by number of speakers. I have added Krio and Pidgin, which were omitted from that post. "African" stands for "African (not further specified)" in the Bureau's list of languages. The total number of speakers of African languages as defined by the Bureau - all of the below except Afrikaans, Arabic (which of course is spoken in Southwest Asia as well as North Africa), Krio, Malagasy, and Pidgin - is 894,499.



Number of speakers1
Margin of Error2
Speak English less than "Very Well"1
Margin of Error2
Arabic
924,374
13,743
341,425
5,888
Kru, Ibo, Yoruba
322,255
7,681
64,690
2,487
Amharic
195,260
6,368
81,385
3,479
Cushite
122,445
4,437
59,495
2,817
Swahili
88,685
3,414
22,055
1,913
Bantu
56,685
2,641
16,635
1,574
Fulani
30,475
2,022
11,745
1,193
Mande
29,835
2,461
10,370
1,171
Afrikaans
23,010
1,525
1,885
318
African
12,320
1,508
5,000
997
Krio
10,560
1,240
2,820
718
Chadic
8,565
991
2,275
426
Sudanic
8,510
1,317
3,935
710
Nilotic
6,890
1,184
2,235
490
Efik
5,620
775
905
305
Pidgin
4,445
636
1,100
352
Berber
2,940
756
1,630
472
Gur
1,310
529
405
272
Nilo-Hamitic
1,275
644
575
327
Malagasy
720
231
225
101
Mbum (and related)
715
353
370
269
Nubian
305
234
185
175
Nilo-Saharan
270
183
155
127
Saharan
80
95
(D)
(D)
Khoisan
55
89
20
32

Notes:
1. Detailed-language estimates are rounded to the nearest multiple of five. Aggregate estimates (in this selection from the original, only Arabic) are unrounded and appear in table B16001 (http://factfinder.census.gov/bkmk/table/1.0/en/ACS/13_5YR/B16001/0100000US). Detailed-language estimates may not sum to aggregate estimates because of rounding.
2. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data at http://www2.census.gov/programs-surveys/acs/tech_docs/accuracy/MultiyearACSAccuracyofData2013.pdf). The effect of nonsampling error is not represented in these tables.

Source: 
U.S. Census Bureau, 2009-2013 American Community Survey, Table 1. Detailed Languages Spoken at Home and Ability to Speak English for the Population 5 Years and Over for United States:  2009-2013. Release Date: October 2015. http://www2.census.gov/library/data/tables/2008/demo/language-use/2009-2013-acs-lang-tables-nation.xls