Thursday, December 12, 2013

The "eng" times for unified capital ŋ?

Perhaps the most widely used "extended Latin" character, the letter ŋ (pronounced "eng" or "engma"), has two different upper case forms that are not used interchangeably, but used alternately by different groups of languages. One of these resembles an N but with a descending hook on the right leg ("N-form"), and the other resembling a larger version of the ŋ ("n-form"). The latter, in turn, has stylistic variations in which the right leg either descends below the line, or stays above it.
Forms of letter "eng"*

The current status and future of these dual forms of capital, and how best to handle them technically for displays were the subject of a brief discussion last month (Nov. 2013) in the wake of proposed change in Dejavu fonts that was first brought up on the developers list for the latter:
In fact, this is a potential issue that has long been known, since different regions tend to use different forms, and different fonts have one or the other form. Consequently,there are many situations where doesn't know what form the capital will take. A larger issue is whether there needs to be a new Unicode character for one of the uppercase forms - what would be called "disunification" of the existing capital letter.

Background

In linguistic terms, the letter ŋ stands for a "velar n," which is pronounced as "ng" in the English word, "king." If it were used in standard English spelling, you might come across something like "siŋiŋ a soŋ." It is used in the orthographies of a range of languages from Saami in northern Europe to a number of African languages (mainly in the west and central regions, but also Dinka in South Sudan and Karamojong in Uganda), to some Aboriginal languages in Australia. (It also figures in the International Phonetic Alphabet, which of course does not need an upper case).

In many languages, the eng is distinguished from "ng" which is a prenasalized "g," pronounced "n-g," and in any event is especially useful in the beginning of words.** In the Fula language, for instance, the difference between "ng" and "ŋ" at the beginning of a number of words is meaningful. The root ŋor- at least in Maasinankoore has to do with a riverbank, while ngor- is derived from the root for male, wor-. A hippopotamus might be referred to as ngabbu, but the root ŋabb- has to do with climbing something or mounting a horse. Ngari is came or arrived; ŋari is beauty. There are other such examples.

Personally it was in Togo that I first encountered use of the letter ŋ, in the Ewe name for Peace Corps and when learning some of the Ewe and Kabiye languages. Then later in Mali when learning Fula and Bambara. The capital letter was always in the "n-form" in those places and in all I ever saw in African languages.

Later I found that a reason for that consistency of usage probably had to do with efforts to standardize letter forms, notably with the African Reference Alphabet proposed 35 years ago in Niamey by the Meeting of Experts on the Transcription and Harmonization of African Languages. (The glyph used in the pre-Unicode African special character standard ISO 6438 [1983] varies for some reason, with an earlier version having the n-form, and later versions from the 1990s showing the N-form.)
Rotated G

One aspect of the graphical history of the letter ŋ is worth noting before moving on: Apparently early printers would sometimes rotate a capital G to produce this character. So in effect the so-called "n-form" capital ŋ actually also looks like it could be called "turned-G form" capital ŋ. (I've produced the one at the right for comparison purposes only.***)

What's the problem then?

The problem with the two main forms (or "glyphs") of the capital ŋ - "n-form" and "N-form" - boils down to not being sure which form you are going to get since different fonts have one or the other form, and with the alternative forms being preferred or required in different places for different languages. This is because the two main forms are treated as the same character in Unicode, with the same "code point" (which a computer software uses to call up the appropriate symbol from the selected or default font).

These are not new issues, but now that they are getting more attention (which may actually be a good sign to the extent that more is going into print in the languages concerned).

From where we are now, there appear to be two options:

  1. Continue as is, but develop means for locales or language preferences to select the appropriate form ("glyph") of upper case ŋ from fonts that have the desired glyph. However the technical feasibility is apparently an issue. 
  2. "Disunify" the capital ŋ into two characters, with one of the major forms being given a new Unicode code point. This would be disruptive, but extremely so if it also required a new code point for a paired lowercase ŋ (with the exact same appearance as the one used throughout this posting) - all kinds of existing digital texts, fonts, and software would have to adjust for the change in some significant set of languages.

Unicode in principle calls for a separate code point for each character so one question is, that with two very different forms/glyphs being historically used and preferred (with varying degrees of intensity) in in different regions, how was the decision made to treat these as variants?

I'm actually looking over some past discussions to see how the issue and alternative approaches were treated. A 20+ message exchange on A12n-collaboration on 4-6 April 2002 among Peter Constable, John Hudson, Andrew Cunningham, and me dealt mainly with forms used in Africa and to a lesser degree Australia, with mention of Saami. (I am reconstituting the 2002-2004 archive of this list to post on A12n-archive.) However that treated all forms as variants.

Ultimately however the main question is the best way forward for all concerned. It is worth noting that Sjur Moshagen's otherwise well-framed proposal to disunify (at the end of the recent email discussions cited above) would put all the burden of change on Africa and anyone working with the numerous African languages which have the ŋ in their orthographies. Disunification the other way would similarly cost those using Saami and Australian Aboriginal languages - so it's a difficult set of choices.

A Niger exception?

A quick note about Denis Jacquereye's statement in the recent email discussions that in Niger, the N-form capital ŋ is more common - this despite the n-form being established in Niger's orthographies and in the "harmonized" orthographies used across the region. It would be of interest to see any examples, but one wonders if a limited choice of fonts might have been a major factor. A larger issue in terms of planning would be the cost of introducing or establishing such a variation ("dis-harmonization"?) in a wider regional usage, and how that might impact font development, software localization (Fula is a regional cross-border language; Zarma is part of the cross-border Sonrai cluster, for which localization is being done), etc. This would be even more problematic if Unicode were to decide to "disunify" the character.


* Source of illustration: Wikimedia Commons
** In the orthographies of many East African languages, such as that for standard Swahili, an apostrophe after ng is used to indicate this difference: ng' = ŋ.
*** "Turned-g" is actually a character used to transliterate text in the Georgian language script.

1 comment:

Don said...

FYI, there is a discussion on this topic on the Unicode list under the title "Engmagate?"