Tuesday, June 30, 2015

Unicode and the architecture of ICT

Latin omega, used in Kulango, and now
included in Unicode's latest version.
Unicode released its 8th version earlier this month, so maybe it's a good time to take stock of what the effort to encode all the world's scripts in a single system - which includes the Universal Coded Character Set (ISO/IEC 10646) - has meant for African languages.

The attention with a coding system for characters is naturally on the characters and scripts they are part of, and among the changes in Unicode 8 are some Latin-based characters used to write the Kulango language of Ivory Coast and the endangered Ik language of Uganda. These have been added to Unicode's "Latin Extended-D" block. (The Unicode standard also includes technical specifications for handling text.)

In past years there have been similar small additions to Unicode, as well as major additions of whole African scripts like Osmanya (2003), Tifinagh (2005), N'Ko (2006), Vai (2008), Bamum (2009), Mende Kikakui (2014), and Bassa Vah (2014).

The Ethiopic/Ge'ez script - used for several languages of Ethiopia and Eritrea, such as Amharic and Tigrinya - was first added in 1999, before the Unicode standard was widely adopted. At the time, different scripts used different coding systems, hindering use of languages with complex or non-Latin scripts, and posing particular problems for anyone wanting to combine scripts in a document or on a webpage.

Unicode as an enabling architecture

With that I wanted to quote from an unsigned blog posting entitled "What are we missing out on" (9 Dec. 2014) that was part of an American University course on international communication, as I think it offers an important perspective on why encoding scripts in Unicode is important:
"Tonight’s presentation on Ethiopic language and its inclusion in Unicode presented an important element about the global digital divide because it asks the question: how can, even with access to information communication technologies and internet access, someone utilize technology if it is not available in their native language? In short, they can’t. This is an important element to consider in regard to technology and the reality of, to borrow from Laura DeNardis’s description, its architecture. As the group described in their literature related to their case study, the architecture of something has the power to include and exclude. The analogy we have used in class in class is that of bridges that are built low enough to prevent busses driving under them.
"In the case of Ethiopia, technology was created in a way that excludes the nation’s 90 languages because American companies created the technology in English with a western cultural perspective. Further, their commercial interests drive their actions, and there is no financial incentive to include languages in which there is no commercial demand. Therefore, not only are these groups of people excluded from the benefits of technology, we are also denied the benefit of their knowledge. As the group noted, we “feel” like we are so much more connected, but cannot assume that the majority of the information is in English and unless everyone is able to put information in the digital realm, we are missing out. After this presentation, I cannot help but to believe that we are indeed missing out. Tonight we discussed Ethiopic, but what other languages are we missing out on?"
The "architecture" of information and communication technology (ICT) in this case starts with the internationalization piece ("i18n"), which is addressed in part by Unicode. It would also include availability or not of localized ("L10n") software, which is especially important for major languages, and content. Localization is also conditioned by factors such as education, policy (of national governments as well as of development organizations), and, as alluded to above, economics. As such, once one moves beyond the enabling architecture, the interplay of factors looks more like an "ecology."

Unicode, by adding scripts and amending characters, makes the written form of African languages theoretically accessible on modern computing devices and across the internet. That's huge progress beyond where things were a decade and a half ago, when various 8-bit encodings dominated (at which time I noted some issues of lack of support for African language scripts were stuck at practically the same place they had been ten years before!). But it's still only part of the process of fully addressing the linguistic dimensions of the digital divide in Africa. Until that happens, we all will be "missing out" in different ways.