Showing posts with label fonts. Show all posts
Showing posts with label fonts. Show all posts

Monday, October 12, 2015

The secret life of Bambara Arial

Quite unexpectedly, I heard yesterday the names of two old 8-bit Malian computer fonts - Bambara Arial and Bambara Times - from two different people. Matt Heberger, who is coordinating work on the forthcoming Bambara translation of Where There is No Doctor mentioned that one of the translators provided him with copies of these fonts (as .ttf files), and Sam Samake, an old friend and former Peace Corps/Mali staff member, asked how I got "Bambara Arial" to work on the internet.

It wasn't supposed to be this way once Unicode rendered such special fonts unnecessary.

The history


In the 1990s, a joint project of the Malian Ministry of Education and the French Agence de Coopération Culturelle et Technique (ACCT; precursor to the Organisation Internationale de la Francophonie), produced two modified/hacked versions of the Arial and Times fonts, replacing characters not used in the orthographies of Malian national languages with characters not present in those original fonts or other commercially available fonts of that era. Thus for example, the "q" was replaced with "ɛ"* (type "q" get "ɛ" instead).

Table from Enguehard & Mbodj 2004.* Caractère affiché is what you get;
Caractère initial is original character & the key you tap to get the new one.
This approach for adapting fonts for writing systems not supported by the pre-Unicode standards was fairly common 1980s and 1990s, including in a number of African countries like Mali. Fonts like these achieved what they were primarily intended for - being able to compose and print documents with the extended alphabets. One could also share digital copies of documents, but those could be properly read only with the same fonts. Different modified fonts - and Mali had several - were mtually incompatible. That was the whole reason, of course, for Unicode, which also makes it possible to share documents in any alphabet across the internet on any browser, wordprocessor, etc.

But while Unicode became the international standard, evidently at least some people in Mali kept using Bambara Arial and perhaps other similar "special fonts." In 2005, USAID-funded "Community Learning and Information Centers" (CLICs) relied on these fonts for anything done in Malian national languages (apparently not that often). It may be that technicians in these telecenters did not have Unicode explained to them in their project training or prior study of computers.

The word about obsolescence of 8-bit fonts like Bambara Arial may not have gotten too far, or maybe the notion of a need for a "special font" to process text in languages like Bambara just was too ingrained. At this point I'm just wondering how after almost 2 decades, these old fonts are still in circulation and conversation. Just two years ago, there were these references to Bambara Arial online (thanks to a Google search):
  • N'oublie pas de les ecrire en vrai Bambara ''Bambara Arial''Dans Microsoft Word (Facebook, 2013-1-11)
  • I would like to arrange for volunteer translators for Bambara. How can I access fonts for the Bambara alphabet (Bambara Arial for example)? (Google code Khan Academy issue tracker, 2013-3-23)

 

More to it?

Adapted from an image by Denis Jacquerye.

There's a twist to the story though. It seems that in one respect, these old fonts follow Malian orthographies better than the Unicode fonts. The letter ɲ has two forms of upper case, one like the lower case letter but bigger ("n-form") and the other like the capital N with a tail on the left leg ("N-form") on the left and right sides respectively of the image on the right. The "n-form" is most used in Mali and apparently in the hacked 8-bit "special fonts" like Bambara Arial; most Unicode fonts use the "N-form" (thanks to Matt Heberger for his observations on that.). I doubt that this alone could account for the persistence of the old fonts, but it might be a factor.

Maybe a new life could be given to the old fonts that people are still using by reencoding them to Unicode and releasing them under recognizable names.



* Two articles in French mention modified 8-bit fonts used in Mali, showing which different characters were changed to extended Latin characters:  
Chantal Enguehard,et Chérif Mbodj. "Correcteurs orthographiques pour les langues africaines." Bulag 29, 2004, pp. 51-68.
Chantal Enguehard, et Soumana Kané. "Langues africaines et communication électronique : développement de correcteurs orthographiques." Agence universitaire de la Francophonie. Actes des Premières Journées scientifiques communes des réseaux de chercheurs concernant la langue, 31mai-1 juin 2004, Ouagadougou, Burkina Faso. pp.59-75.

Tuesday, November 25, 2014

Writing Bambara right

How to compose text in the Latin-based orthography of the Bambara language of Mali? One question raised by an ebola poster in non-standard Bambara (see previous posting) is whether the modified letters (technically "extended characters") in the Bambara alphabet  discourage use of the standard orthography. There are two potential issues - fonts and keyboards - although noting use of standard Bambara in other materials, these are not the impediment they once were. I'll briefly discuss both below, after a quick intro to written Bambara.

Bambara orthography


The Bambara alphabet today includes the following characters:
a  b  c  d  e  ɛ  f  g  h  i  j  k  l  m  n  ɲ  ŋ  o  ɔ  p  r  s  t  u  w  y  z

Digraph consonants (two letters to represent one sound) have been phased out of use, such that "ny" is now "ɲ" (in Senegalese orthography this would be "ñ"). However, "sh" is still used - although the IPA borrowing for this sound - "ʃ" - is sometimes seen (just today noted it in an email by odd coincidence). Double vowels however are used, for words where the vowel sound is slightly prolonged.

Bambara is a tonal language. The two tones are rarely marked, but when they are, accent marks are used. (A change in the alphabet some years ago from "è" and "ò" to "ɛ" and "ɔ" permitted marking of tones with accents rather than underscores for low tone).

Fonts


Time was, the lack of fonts (and before Unicode became the dominant standard, character encoding behind the fonts and the lack of compatibility among different 8-bit fonts) presented the main problem for creating and sharing text in Bambara with the extended characters ɛ, ɲ, ŋ, and ɔ.

Font support for extended Latin characters is still uneven, though current operating systems can substitute a missing character from another font (all being encoded in Unicode). As I compose this posting, I note Blogger's default font lacking 3 of the 4 extended characters from the obvious substitution (per figure above from screenshot; background color added). On the other hand, the font for the published posting does include these characters. So no substitution is necessary.

Basically this means that most of the time, one can display the needed characters, but for aesthetic reasons, fonts that include all of those characters would be preferable. In finding fonts, it is helpful to know that the needed extended characters may be spread among several Unicode "blocks." For Bambara this means a font will have to have, in addition to the basic Latin blocks common to any font with Latin characters, the following extended blocks:
Latin Extended-A is fairly common in fonts, but not the other three above. (If needed, the "ʃ" and its capital form would be covered by the IPA and Extended-B ranges.)

Alan Wood's extensive list of "Unicode character ranges and the Unicode fonts that support them" is an excellent resource for finding fonts for specific Unicode ranges. (Sill looking for a resource that would allow one to choose several Unicode blocks and get a list of fonts that cover them.)

Keyboards


Since display of extended characters is no longer the impediment it used to be, the big issue now seems to be how to efficiently compose text with extended characters that are not supported by computer keyboards (i.e., not via inserting symbol in a wordprocessor or cutting and pasting characters from another source). This means use of alternative keyboard drivers or onscreen keyboards or character pickers.

In the latter category, there are a couple of websites worth noting. In both, one types from one's keyboard and then clicks on extended characters, producing text onscreen that can be copied and pasted elsewhere:
  • Lexilogos has a page for Bambara, featuring a window where one can type basic Lain characters and then click on the extended ones onscreen (and diacritics for accents).
  • Richard Ishida has a more complex IPA Character Picker enabling input of many more extended Latin characters.
Keyboard drivers enable one to use one's existing keyboard in any application. These generally use either Tavultesoft Keyman or Microsoft Keyboard Layout Creator (MSKLC). A short list of links to current keyboard drivers useful for composing in Bambara follows (with thanks to Valentin Vydrin, who shared this on the Translating Hope list). I'll add to this but encourage comments with additional information:
  • Via Mali Pense site (see under "ÉCRIRE LE BAMBARA - ka bamanankan sɛbɛn"). Note also a spell checker ("vérificateur orthographique"; see under "POUR ÉCRIRE SANS FAUTES - Fililatilennan sɛbɛnni na")
  • Via LLACAN site (see under "Saisir des caractères spéciaux sous windows.")

Thursday, September 25, 2014

N'Ko on the web: Review of experience with ebola FAQ

Here's a quick recap of the demonstration of (experiment with) posting the N'Ko text of a WHO FAQ on ebola. Shortly after I posted it, I wrote "There would seem to be no reason not to use the internet for dissemination of webpages and mobile content about ebola in N'Ko, ..." Then Charles Riley of Yale University and Athinkra LLC pointed out some problems. And I found others.

I stand by that estimation, but after some tedious work on formatting and font coding in HTML, do so with greater emphasis on the caveats: "... although there would need to be attention to testing of commonly used systems and of ways to feed or facilitate loading of fonts that include N'Ko."

A summary of findings and lessons learned follows, but first a quick recap of what I did. My first attempt was to copy-past N'Ko text from a PDF of the FAQ on ebola, which as previously mentioned, did not work (RTL text was switched to LTR; character order was mixed, and combining diacritics sometimes not combined). Working from the Word document original, in contrast, was deceptively easy since I was working in Firefox (FF) ver. 32 - copy-paste into the "Compose" screen on Blogger, fix some bullets and do some minor formatting and voilà!

Problems were identified in Microsoft Internet Explorer 8 and Chrome 37 (mostly white boxes but some strings of text), and FF (some alignment of combining diacritics used to mark tones). Varying font commands inherited from the Word source document caused the irregular display of characters, and were fixed (an alternative to try later would be an unformatted text paste). 

However, while this work facilitated a correct display of diacritics on FF 32.02 and MSIE 8 on a computer running Windows XP, the diacritics were all off the mark in FF 32.03 and MSIE 11 on a Windows 7 system (the latter did not have the DejaVu fonts installed, which might be the problem there).

Lessons learned so far:
  • align="left" - Full justification of N'Ko text may space the combining diacritics as if they were characters, leading to misalignment.
  • Font choice/availability - N'Ko text display seems especially sensitive to font commands. Do not have an answer to why formatted text from a Word document pasted into Blogger looked fine in FF but not in MSIE 8 or Chrome. Installation of a font with N'Ko support may help.
  • Browsers may not be the main issue - Despite initially encountering display issues in MSIE 8 and Chrome more than FF, those could be corrected in the source code. Also, the fact that the most current FF (32.03) and MSIE (11) misplace all diacritics on a computer without a particular font (DejaVu) points back to the font issue. 
  • But browsers are not not an issue - The N'Ko script normally shows liaisons between characters within words (somewhat like Arabic), but while these show on FF, they did not on MSIE
  • No bold or italics - Bold N'Ko is apparently not supported by the DejaVu font. Italics are, but while those did display on FF 32, they would not on MSIE 8.
Next steps

This has been a learning experience, but it should be pointed out that with sites like Kanjamadi, N'Ko on the internet is a reality and a potential to be pursued, even as there are issues. (Kanjamadi displays impressively on the same FF32 on XP that did great with the WHO ebola FAQ, but cannot load in MSIE 8 and has the same diacritic problems on MSIE 11 on the Windows 7 machine mentioned above).

Before posting more in N'Ko, it would be helpful to have more feedback on display issues for this script in function of available fonts, browsers, and operating systems.

Next steps should have as their goal a simple how-to for organizations wanting to display text in N'Ko on the web - whether for ebola education or any other useful purpose. Similar localization guides could be developed for other West African languages as well.