Wednesday, August 31, 2016

Missing "macrolanguages" of Africa

Screenshot from VOA's Kinyarwanda/Kirundi site
The Voice of America (VOA) recently had a job opening for "International Broadcaster (Multimedia) (Kirundi/Kinyarwanda)." Kirundi and Kinyarwanda are the mother tongues, national languages, and co-official languages in, respectively, Burundi and Rwanda. And they are mutually intelligible, with only minor differences, such that apparently a fluent speaker of either could work on a program serving speakers of both. But there is no term covering both - unless one counts the hyphenated Rwanda-Rundi - and no language coding category to cover material designed for use across the two.

This is a situation encountered with many languages in Africa, and one for which there is at least one potential solution - the neologism and language coding category "macrolanguage." There are actually some macrolanguages defined in Africa, but these are few, and as I discuss below, kind of accidental. Is it time to systematically identify (and code) macrolanguages in Africa?

What defines a language?

For most of us, the distinction between languages seems pretty straightforward. But beyond the most spoken international languages - those used officially by the United Nations or ones you are likely to see on a school curriculum - the situation is often more complex. Sometimes two or more closely related languages are so similar that their speakers can understand each other, but sometimes variations within one language can make understanding difficult. An earlier posting on this blog looked at the notion of "neighbor languages" in Scandinavia and Africa. A broader consideration of these issues by Columbia University's John McWhorter suggests that we're really all speaking dialects, some of which benefit from written forms, and one might add, status, resources, and policy support. There is some truth to the saying that "A language is a dialect with an army and a navy."

However, the issues of what to call a "language" and where to draw the boundaries between it and another "language" are still of practical importance for communication (standardization, references, ICT use) and planning (government, business, education). There are two broad approaches in linguistics to doing this, corresponding with the splitter/lumper (or joiner) approaches to categorizing:  one focusing more on distinctions, and the other focusing more on commonalities.

Without going too deeply into that discussion, which gets more complicated when accounting for issues of identity, names, written forms, and national boundaries, suffice it to say that in considering African languages, there are many situations where one encounters the splitter/lumper choice.

The major reference of languages in the world, Ethnologue, takes a more splitter approach, which means that speech varieties that are closely related and interintelligible may be classified as separate languages. It is their estimate of the number of language in Africa (over 2000) that is most commonly cited, but there are other more conservative estimates.A good academic discussion of this issue entitled "How many languages are there in Africa?" was published in 2004 by Jouni Filip Maho (his estimate is under 1500).

What is a "macrolanguage"?

To make the story brief, the term "macrolanguage" is not a term that was used in linguistic description before the inauguration of the  ISO 639-3 system for encoding all languages in the late 2000s. Since that system is based on Ethnologue's "splitter" data, a new category was needed to accommodate existing codes in the earlier less comprehensive parts of ISO 639 (1&2) that in many cases were more "lumper" in approach. The term macrolanguage was in effect a "shim," to borrow someone else's term, to fit the two systems together.

There are by my count 14 macrolanguages listed for Africa (names linked to the Ethnologue macrolanguage pages): Akan; Arabic; Dinka; Fulah; Gbaya; Grebo; Kalenjin; Kanuri; Kongo; Kpelle; Malagasy; Mandingo; Oromo; and Swahili. There could be others.

That brings us back to Kinyarwanda and Kirundi. How is the relationship between them different - more distant - than any of the above established macrolanguages? One difference, as mentioned above, is no common name to make it easy, and another is that they are dominant in different countries - perhaps analogous to the situation of Scandinavian languages?

Another curious situation is that of Mandingo, which includes several western Manding languages, but not Bambara and Jula (Dyula). Even if the latter two were considered too different from the other Manding tongues, they are close enough that one could localize software for the two together. Keep in mind also that the emerging literary standard N'Ko covers all Manding languages (in a different alphabet). Should the Mandingo macrolanguage be extended to include them all?

The four languages of southwestern Uganda - Kiga, Nkore, Nyoro, ajd Tooro - are close enough to be covered by Runyakitara, a proposed (but not encoded) standard which is being used in various ways, including at least some teaching and a localization of the Google interface. Should these four be considered a macrolanguage under perhaps that same name, thus finally providing a code for localization in Runyakitara?

And there are other examples around the continent that could be discussed.

What good would more macrolanguages do?

The first benefit of identifying more macrolanguages would be in language coding - the very environment in which the term was first used. The language of VOA's website for its Kinyarwanda/Kirundi service - - is coded as "rw" (Kinyarwanda) since there is no macrolanguage code covering both languages. Likewise, in many cases, the grouping of very close and mutually intelligible languages as a macrolanguage could facilitate localization of software and apps to serve larger populations - and those larger markets could make it more likely that such localization would be pursued and maintained.

Another benefit would be to complement the tendency in language coding towards seeking more granularity, by recognizing natural groupings of languages (for more on this, see a message to the IETF-languages list last May). In effect providing more balance between splitting and lumping/joining.

In the broader picture, identifying macrolanguages could have benefits for policymaking and program development involving languages within macrolanguage groups, by calling attention to the closely related languages. Especially where foreigners are involved, projects may overlook such relationships and the potential resources they may provide. For example materials development for education, and various communication needs might benefit from tapping efforts and resources in closely related languages.

(Minor edits and image added, 2 Sep. 2016)