Showing posts with label Dejavu. Show all posts
Showing posts with label Dejavu. Show all posts

Thursday, September 25, 2014

N'Ko on the web: Review of experience with ebola FAQ

Here's a quick recap of the demonstration of (experiment with) posting the N'Ko text of a WHO FAQ on ebola. Shortly after I posted it, I wrote "There would seem to be no reason not to use the internet for dissemination of webpages and mobile content about ebola in N'Ko, ..." Then Charles Riley of Yale University and Athinkra LLC pointed out some problems. And I found others.

I stand by that estimation, but after some tedious work on formatting and font coding in HTML, do so with greater emphasis on the caveats: "... although there would need to be attention to testing of commonly used systems and of ways to feed or facilitate loading of fonts that include N'Ko."

A summary of findings and lessons learned follows, but first a quick recap of what I did. My first attempt was to copy-past N'Ko text from a PDF of the FAQ on ebola, which as previously mentioned, did not work (RTL text was switched to LTR; character order was mixed, and combining diacritics sometimes not combined). Working from the Word document original, in contrast, was deceptively easy since I was working in Firefox (FF) ver. 32 - copy-paste into the "Compose" screen on Blogger, fix some bullets and do some minor formatting and voilà!

Problems were identified in Microsoft Internet Explorer 8 and Chrome 37 (mostly white boxes but some strings of text), and FF (some alignment of combining diacritics used to mark tones). Varying font commands inherited from the Word source document caused the irregular display of characters, and were fixed (an alternative to try later would be an unformatted text paste). 

However, while this work facilitated a correct display of diacritics on FF 32.02 and MSIE 8 on a computer running Windows XP, the diacritics were all off the mark in FF 32.03 and MSIE 11 on a Windows 7 system (the latter did not have the DejaVu fonts installed, which might be the problem there).

Lessons learned so far:
  • align="left" - Full justification of N'Ko text may space the combining diacritics as if they were characters, leading to misalignment.
  • Font choice/availability - N'Ko text display seems especially sensitive to font commands. Do not have an answer to why formatted text from a Word document pasted into Blogger looked fine in FF but not in MSIE 8 or Chrome. Installation of a font with N'Ko support may help.
  • Browsers may not be the main issue - Despite initially encountering display issues in MSIE 8 and Chrome more than FF, those could be corrected in the source code. Also, the fact that the most current FF (32.03) and MSIE (11) misplace all diacritics on a computer without a particular font (DejaVu) points back to the font issue. 
  • But browsers are not not an issue - The N'Ko script normally shows liaisons between characters within words (somewhat like Arabic), but while these show on FF, they did not on MSIE
  • No bold or italics - Bold N'Ko is apparently not supported by the DejaVu font. Italics are, but while those did display on FF 32, they would not on MSIE 8.
Next steps

This has been a learning experience, but it should be pointed out that with sites like Kanjamadi, N'Ko on the internet is a reality and a potential to be pursued, even as there are issues. (Kanjamadi displays impressively on the same FF32 on XP that did great with the WHO ebola FAQ, but cannot load in MSIE 8 and has the same diacritic problems on MSIE 11 on the Windows 7 machine mentioned above).

Before posting more in N'Ko, it would be helpful to have more feedback on display issues for this script in function of available fonts, browsers, and operating systems.

Next steps should have as their goal a simple how-to for organizations wanting to display text in N'Ko on the web - whether for ebola education or any other useful purpose. Similar localization guides could be developed for other West African languages as well.

Thursday, December 12, 2013

The "eng" times for unified capital ŋ?

Perhaps the most widely used "extended Latin" character, the letter ŋ (pronounced "eng" or "engma"), has two different upper case forms that are not used interchangeably, but used alternately by different groups of languages. One of these resembles an N but with a descending hook on the right leg ("N-form"), and the other resembling a larger version of the ŋ ("n-form"). The latter, in turn, has stylistic variations in which the right leg either descends below the line, or stays above it.
Forms of letter "eng"*

The current status and future of these dual forms of capital, and how best to handle them technically for displays were the subject of a brief discussion last month (Nov. 2013) in the wake of proposed change in Dejavu fonts that was first brought up on the developers list for the latter:
In fact, this is a potential issue that has long been known, since different regions tend to use different forms, and different fonts have one or the other form. Consequently,there are many situations where doesn't know what form the capital will take. A larger issue is whether there needs to be a new Unicode character for one of the uppercase forms - what would be called "disunification" of the existing capital letter.

Background

In linguistic terms, the letter ŋ stands for a "velar n," which is pronounced as "ng" in the English word, "king." If it were used in standard English spelling, you might come across something like "siŋiŋ a soŋ." It is used in the orthographies of a range of languages from Saami in northern Europe to a number of African languages (mainly in the west and central regions, but also Dinka in South Sudan and Karamojong in Uganda), to some Aboriginal languages in Australia. (It also figures in the International Phonetic Alphabet, which of course does not need an upper case).

In many languages, the eng is distinguished from "ng" which is a prenasalized "g," pronounced "n-g," and in any event is especially useful in the beginning of words.** In the Fula language, for instance, the difference between "ng" and "ŋ" at the beginning of a number of words is meaningful. The root ŋor- at least in Maasinankoore has to do with a riverbank, while ngor- is derived from the root for male, wor-. A hippopotamus might be referred to as ngabbu, but the root ŋabb- has to do with climbing something or mounting a horse. Ngari is came or arrived; ŋari is beauty. There are other such examples.

Personally it was in Togo that I first encountered use of the letter ŋ, in the Ewe name for Peace Corps and when learning some of the Ewe and Kabiye languages. Then later in Mali when learning Fula and Bambara. The capital letter was always in the "n-form" in those places and in all I ever saw in African languages.

Later I found that a reason for that consistency of usage probably had to do with efforts to standardize letter forms, notably with the African Reference Alphabet proposed 35 years ago in Niamey by the Meeting of Experts on the Transcription and Harmonization of African Languages. (The glyph used in the pre-Unicode African special character standard ISO 6438 [1983] varies for some reason, with an earlier version having the n-form, and later versions from the 1990s showing the N-form.)
Rotated G

One aspect of the graphical history of the letter ŋ is worth noting before moving on: Apparently early printers would sometimes rotate a capital G to produce this character. So in effect the so-called "n-form" capital ŋ actually also looks like it could be called "turned-G form" capital ŋ. (I've produced the one at the right for comparison purposes only.***)

What's the problem then?

The problem with the two main forms (or "glyphs") of the capital ŋ - "n-form" and "N-form" - boils down to not being sure which form you are going to get since different fonts have one or the other form, and with the alternative forms being preferred or required in different places for different languages. This is because the two main forms are treated as the same character in Unicode, with the same "code point" (which a computer software uses to call up the appropriate symbol from the selected or default font).

These are not new issues, but now that they are getting more attention (which may actually be a good sign to the extent that more is going into print in the languages concerned).

From where we are now, there appear to be two options:

  1. Continue as is, but develop means for locales or language preferences to select the appropriate form ("glyph") of upper case ŋ from fonts that have the desired glyph. However the technical feasibility is apparently an issue. 
  2. "Disunify" the capital ŋ into two characters, with one of the major forms being given a new Unicode code point. This would be disruptive, but extremely so if it also required a new code point for a paired lowercase ŋ (with the exact same appearance as the one used throughout this posting) - all kinds of existing digital texts, fonts, and software would have to adjust for the change in some significant set of languages.

Unicode in principle calls for a separate code point for each character so one question is, that with two very different forms/glyphs being historically used and preferred (with varying degrees of intensity) in in different regions, how was the decision made to treat these as variants?

I'm actually looking over some past discussions to see how the issue and alternative approaches were treated. A 20+ message exchange on A12n-collaboration on 4-6 April 2002 among Peter Constable, John Hudson, Andrew Cunningham, and me dealt mainly with forms used in Africa and to a lesser degree Australia, with mention of Saami. (I am reconstituting the 2002-2004 archive of this list to post on A12n-archive.) However that treated all forms as variants.

Ultimately however the main question is the best way forward for all concerned. It is worth noting that Sjur Moshagen's otherwise well-framed proposal to disunify (at the end of the recent email discussions cited above) would put all the burden of change on Africa and anyone working with the numerous African languages which have the ŋ in their orthographies. Disunification the other way would similarly cost those using Saami and Australian Aboriginal languages - so it's a difficult set of choices.

A Niger exception?

A quick note about Denis Jacquereye's statement in the recent email discussions that in Niger, the N-form capital ŋ is more common - this despite the n-form being established in Niger's orthographies and in the "harmonized" orthographies used across the region. It would be of interest to see any examples, but one wonders if a limited choice of fonts might have been a major factor. A larger issue in terms of planning would be the cost of introducing or establishing such a variation ("dis-harmonization"?) in a wider regional usage, and how that might impact font development, software localization (Fula is a regional cross-border language; Zarma is part of the cross-border Sonrai cluster, for which localization is being done), etc. This would be even more problematic if Unicode were to decide to "disunify" the character.


* Source of illustration: Wikimedia Commons
** In the orthographies of many East African languages, such as that for standard Swahili, an apostrophe after ng is used to indicate this difference: ng' = ŋ.
*** "Turned-g" is actually a character used to transliterate text in the Georgian language script.