Sunday, January 31, 2016

Wikipedia at 15 & African languages

Image source: Wikimedia Commons
via GeoEd Trek blog
Wikipedia celebrated its 15th anniversary earlier this month, so it would seem to be an opportune time to look at how its African language versions are doing. They number 37 by my count (including Arabic), though it should be noted at the outset that most of these are only about 11-12 years old. The short of the story is that while a few African language editions are doing quite well, many are stagnant.

Background & history


Wikipedia has always been a multilingual enterprise, building on Jimmy Wales' vision of delivering "a free encyclopedia of the highest possible quality to every single person on the planet in their own language."1 There are currently 291 different language editions (280 active) of which the English edition was the first to be launched, on 15 January 2001. During the rest of that year, it added 13 more European languages (plus a second "simple" edition of English), and its first African language - Afrikaans (total 16 editions, though efforts on some other languages were also initiated2). In 2002 it added 12 more language editions, mostly European or Asian, and in 2003, 25 more, of which Arabic may count as the second African language edition.

On top of a total of 53 editions, 112 more were added in 2004, including 18 African languages: Afar, Hausa, Kinyarwanda, Lingala, Malagasy, Oromo, Shona, Swati, Somali, Sotho (Sesotho), Swahili, Tsonga, Tswana, Twi, Venda, Xhosa, Yoruba, and Zulu. The year 2005 saw the total increase from 165 to 201, including 12 more African languages: Amharic, Bambara, Chewa/Nyanja, Ewe, Fula, Gikuyu, Igbo, Kirundi, Sango, Tigrinya, Tumbuka, and Wolof.

In 2006, the story starts to get more complicated. The total number of Wikipedia editions increased to 257, a net increase of 46, including six more from Africa: Herero, Kanuri, Kongo, Kuanyama, Luganda, and Ndonga. The Afrikaans Wikipedia became the first African language edition to pass 5000 articles. However the pattern of stagnation of many other African language editions started to become apparent. On the other hand, beginning with a couple of presentations at Wikimania 20063 and the subsequent founding of the AfrophoneWikis list, African language editions as a group started to get more attention (although this geographic division has no special standing within Wikimedia, and in fact the idea had some initial pushback).

In 2007 the Akan and Kabyle editions were started; in 2008, Egyptian Arabic. Northern Sotho came later. At about this time began more attention to localizing the interfaces of African language editions, as well as proposals to close African language Wikipedias (and other Wikimedia projects).

Growth of  African language Wikipedias


Growth of African language Wikipedias has generally been slow, though some editions have experienced bursts of growth, if you will. In 2009, Swahili passed Afrikaans to be the largest African language edition (apart Arabic), and was in turn passed by Yoruba in 2011. In recent years, the Malagasy Wikipedia edition has grown quickly, becoming the largest African language Wikipedia in 2012. (The Malagasy Wiktionary has also grown to one of the largest overall.) In some cases a single individual is the main driver in adding content. Similar patterns of growth are observed in some smaller Wikipedias.

The Neverness blog has a good series of articles going back several years that discuss aspects of growth of African language Wikipedias, with focus on South Africa.

A problem common to many African language editions of Wikipedia has been the quality of articles, with many being very short and in some cases inconsistent with orthographies.  These are the kind of issues not unexpected with less-resourced languages, but also point to a need for more attention by language experts.

Closing Wikipedias and ... the incubator


There have been 48 proposals to close various African language projects, about half of them for Wikipedias, and the other half for other Wikimedia projects (Wiktionary, Wikiquote, etc.). Most of the Wikipedia closure proposals did not pass; for some there were even second attempts. Overall one got the impression at one point that there was more focus on closing African language projects with little activity than on developing them (see brief discussion of the first proposal to close the Xhosa Wikipedia in 2008 on this blog).

The concept of "closing" in Wikimedia parlance needs explaining. Usually when you hear some kind of project or initiative being "closed," it means something final - it's over. With Wikipedias, however, its not that simple. Technically "closed" means no further open participation, with the edition moved to an intermediate level called "incubator" - which in turn requires more explanation.

The Wikimedia Incubator is intended to serve "as a platform where anyone can build up a community in a certain language edition of a Wikimedia project." Most projects there are new, but some, like Wikipedias for Afar, Herero, Kanuri, Kuanyama, and Ndonga, are "closed" projects.

There are different opinions about the benefits of keeping struggling projects open vs. closing them and sending them down to the incubator, and these emerge with each proposal to close a project. It should be noted that the criteria for closure has evolved to where inactivity is no longer considered sufficient justification by itself.

Two African language Wikipedias were started in the incubator: Egyptian Arabic and Northern Sotho. There are a number of other African language projects currently in the Incubator, of which I previously posted about Krio. Another one I'll call attention to is Kabiye of northern Togo - the first African language I actually studied. (For others, see the full list on the Incubator.)

Data on African language Wikipedias


A sortable table follows with information on 36 active African language editions of Wikipedia.4 Arabic, which is larger than the others by an order of magnitude or more is omitted from the table. "Depth" is an index intended to indicating how often articles are edited.5 (For comparison, the depth index for Arabic is 224, that for English is 913, and French is 206.)


Language
Articles
Total Pages
Edits
Admins
Total Users
Active Users
Depth
81,142
215,494
812,070
3
11,439
46
10
38,446
93,859
1,470,439
12
75,385
160
33
31,775
77,451
1,007,865
9
23,199
58
27
31,144
53,515
550,296
1
13,189
40
5
14,667
120,315
754,615
7
74,209
96
325
13,023
42,417
343,881
3
20,348
36
41
3,777
15,381
163,889
2
13,310
66
101
2,782
4,148
22,253
1
2,322
6
1
2,530
7,911
64,331
0
5,609
20
37
2,458
6,796
47,677
0
5,892
15
22
2,119
6,881
112,968
2
6,106
12
83
1,784
4,961
77,554
0
5,598
12
50
1,356
3,300
33,392
0
5,360
10
21
1,116
2,579
42,204
1
5,146
9
28
1,047
5,437
59,324
1
6,087
11
192
1,042
5,263
101,068
2
8,178
11
315
847
2,225
18,884
0
3,700
12
22
725
3,947
38,618
0
8,153
23
193
709
2,598
21,302
0
3,556
11
58
682
3,676
24,705
0
4,180
14
130
559
1,698
17,733
0
5,957
12
43
503
2,252
21,160
1
4,510
8
114
489
1,667
19,694
0
4,211
9
--
462
2,354
28,632
0
5,268
11
--
440
1,671
23,783
0
3,393
8
--
412
1,872
37,384
2
4,168
10
--
358
2,971
36,291
1
6,023
8
--
317
3,516
48,123
1
6,716
16
--
293
2,803
18,657
0
4,900
10
--
281
1,901
29,322
1
4,312
10
--
252
1,798
20,536
1
5,771
17
--
240
1,472
20,413
0
3,447
6
--
189
1,463
17,037
0
3,515
9
--
182
1,610
21,284
0
4,204
9
--
161
1,470
19,764
0
4,302
16
--
160
1,407
17,139
1
3,708
13
--

The English language Wikipedia has articles with background about some of the abovementioned editions, including: Afrikaans, Arabic, Bambara, Egyptian Arabic, Northern Sotho, Swahili, Tsonga, Venda, Wolof, Xhosa, Yoruba, and Zulu.

Where to now?


Personally I see a lot of potential in Wikipedia and other Wikimedia projects, even as issues with the main editions cause speculation about its sustainability (for example in the New York Times last year). A lot of that potential however may actually be in the years ahead, with growth of the smaller editions such as those of African languages. How that might work and conditions that would make it possible are topics to return to later this year.


1. This quote came from a message Wales posted to the Wikipedia-L list in 2005, but it is along the lines of similar statements he made earlier.
2. The Wikipedia page on its own history indicates that versions in some major non-European languages - including Arabic - were attempted in 2001, however there were evident coding issues for those with non-Latin scripts (see ar.wikipedia.com, for example).
3. The presentations were by Martin Benjamin and Kasper Souren.
4. Copied from the List of Wikipedias on Wikimedia's Meta-Wiki on 30 January 2016.
5. Wikimedia's formula for the "depth" index is: [Edits/Articles] × [Non-Articles/Articles] × [1 − Stub-ratio]