While in Mali in late 1999 and early 2000, inspired in part by early research by FUNREDES on Languages and Cultures of the Internet, I began thinking about strategies for increasing African language web content. While recognizing of course that such content would come primarily from communities of speakers of African languages, as well as internationally funded projects which at the time were beginning to think about how to use the internet for development, the motivation was to facilitate creation of an environment favorable to its creation and use.
Recent discussion of one project and reading about another, each of which deal in different ways with content and communication, and reading the Internet Society's (ISOC) August 2016 report on "Promoting Content in Africa," prompt me to revisit this early effort and look at how things are playing out.
The basic idea in 1999 was to disaggregate approaches to internet content development in African languages, and consider how each could optimally contribute to the overall goal of greater presence of those languages in cyberspace. In 2003 I reworked that schema to share more widely - for example on the short-lived Africa Web Content Owner email list.1 Elements of this strategy were incorporated in different ways in later work, such as African Languages in a Digital Age (ALDA).
The main approaches were:
Therefore the possibility of taking existing texts in African languages - from published books or other printed materials - and putting them on the web was suggested as a way to give a small but significant boost to efforts to generate African language internet content. Such texts often have historical or cultural value, and may already be in a standard orthography (or transcriptions that could be easily converted into them.3 A sustained effort to "weblish" these materials, according to this thinking, could quickly add quality material to what is available on the web in a number of languages, and more importantly, make many materials that are accessible only in university libraries more readily available to speakers of those languages. However, copyright protections limit the potential of this tactic (although there have been some sites that appear to have made some of these materials available online without permission).
Therefore, an emphasis was put on alternative ways to create new content, especially translation (#2 above) of various relevant, useful, and interesting material aided by MT, and content built around the spoken voice (#3), responding to oral dimensions of African cultures, as well as the low literacy rates in African languages.
MT in that era especially was mainly a hope for the future, and aside from a few experiments most advances in the 2000s were for pairs of major (mainly Europhone) languages. Nowadays the technology has improved, but the statistical methods that have been key in that evolution require language resources that do not exist for many languages (at least yet). As such, the contribution of MT to content development in African languages is still in the future.
As far as audio content, this did not emerge as a significant component on the internet (unless one counts songs or the sending of audio files as email attachment, both of which were transferred over the internet rather than presented as part of web content). But see the discussion regarding video sharing below.
Through the 2000s, African language content seemed to grow only marginally and unevenly. A pair of studies published by Rifal in 2003 provided perspectives on this subject that are still useful:
The rise of social media, video sharing, and mobile devices over the last decade or so has changed how we think of and produce web content, opening new possibilities for African languages in cyberspace.
Social media, including blogs and wikis, makes the creation of content in any language easier. But in the case of many African languages, also brings us face-to-face with other limitations in input systems (for extended Latin and non-Latin scripts), education (where schools use only Europhone languages so that people aren't familiar with writing their first languages), and incentive (where the audience for text-based content in less widely spoken African languages is perceived to be small).
Video in a way fulfills the old idea of audio content on the web, but with the obvious advantage of visual (though there are at least a few YouTube videos with static presentation - a picture or line of text - and full audio in one or another African language). What's missing as far as I can tell is a way to find videos in specific languages that does not rely on the producer having tagged it appropriately (which may not happen).
Mobile devices have changed how we access and interact with content, and consequently how content is designed and even conceived. They also have become the most common way for Africans in general to access the internet - proportionately more important I believe than in any other continent. What I don't have a sense of is how much content in African languages is developed with mobile devices in mind. On the other hand the input limitations for some writing systems would certainly be an issue for use of some African languages in messaging for example.
ISOC's recent report includes a look at structures to support development of content in Africa, including in African languages. It is encouraging to note the attention ISOC is giving in this report to the importance of African language content for internet use in Africa.
One of the recommendations ISOC has for promoting local content in Africa, including that in African languages, is to promote development of local infrastructure, including data centers, Content Delivery Networks, and Internet Exchange Points. This idea to in effect create a facilitating environment for creation of African language content is an interesting strategy, and would complement other efforts such as mentioned above.
1. The direct link to the post in the AWCO archives is apparently accessible only to subscribed group members. I've created an alternative presentation of it on my website. That post has more background.
2. An early consideration of audio and web content on AWCO mentions Native American interest in the topic, as well as a project in Mauritania (also available on my website).
3. Numerous transcriptions of histories and tales from before the adoption of current orthographies used systematic notation that generally corresponds directly to characters used today (1:1 or occasionally 2:1). I have encountered this for example in various older materials on Fula and Bambara. Cheick Anta Diop's famous Wolof translations of scientific and European cultural texts into Wolof (1955) similarly used a regular transcription that predated the current standard orthography in Senegal.
ISOC's report, 8/2016 |
Looking forward from 1999
The basic idea in 1999 was to disaggregate approaches to internet content development in African languages, and consider how each could optimally contribute to the overall goal of greater presence of those languages in cyberspace. In 2003 I reworked that schema to share more widely - for example on the short-lived Africa Web Content Owner email list.1 Elements of this strategy were incorporated in different ways in later work, such as African Languages in a Digital Age (ALDA).
The main approaches were:
- Composition of text-based content (including where possible, digitization of works previously published in African languages)
- Translation of text-based content from other languages (leveraging the then emerging machine translation [MT] technology)
- Development of content in non-text formats (with specific reference to audio2)
Therefore the possibility of taking existing texts in African languages - from published books or other printed materials - and putting them on the web was suggested as a way to give a small but significant boost to efforts to generate African language internet content. Such texts often have historical or cultural value, and may already be in a standard orthography (or transcriptions that could be easily converted into them.3 A sustained effort to "weblish" these materials, according to this thinking, could quickly add quality material to what is available on the web in a number of languages, and more importantly, make many materials that are accessible only in university libraries more readily available to speakers of those languages. However, copyright protections limit the potential of this tactic (although there have been some sites that appear to have made some of these materials available online without permission).
Therefore, an emphasis was put on alternative ways to create new content, especially translation (#2 above) of various relevant, useful, and interesting material aided by MT, and content built around the spoken voice (#3), responding to oral dimensions of African cultures, as well as the low literacy rates in African languages.
MT in that era especially was mainly a hope for the future, and aside from a few experiments most advances in the 2000s were for pairs of major (mainly Europhone) languages. Nowadays the technology has improved, but the statistical methods that have been key in that evolution require language resources that do not exist for many languages (at least yet). As such, the contribution of MT to content development in African languages is still in the future.
As far as audio content, this did not emerge as a significant component on the internet (unless one counts songs or the sending of audio files as email attachment, both of which were transferred over the internet rather than presented as part of web content). But see the discussion regarding video sharing below.
Just how much African language content?
Through the 2000s, African language content seemed to grow only marginally and unevenly. A pair of studies published by Rifal in 2003 provided perspectives on this subject that are still useful:
- A survey of Hausa, Lingala, Somali, and Xhosa content on the web done by Anneleen Van der Veken and Gilles-Maurice de Schryver used a statistical method inferring amount of content from presence of key words to suggest that there was more content in these languages than one might estimate from web searches or surfing.
- A census of websites with African language content by Marcel Diki-Kidiri and Edema Atibakwa Baboya, concluded among other things that these tended to be studies of the languages rather than communication using them.
New kinds of content
The rise of social media, video sharing, and mobile devices over the last decade or so has changed how we think of and produce web content, opening new possibilities for African languages in cyberspace.
Social media, including blogs and wikis, makes the creation of content in any language easier. But in the case of many African languages, also brings us face-to-face with other limitations in input systems (for extended Latin and non-Latin scripts), education (where schools use only Europhone languages so that people aren't familiar with writing their first languages), and incentive (where the audience for text-based content in less widely spoken African languages is perceived to be small).
Video in a way fulfills the old idea of audio content on the web, but with the obvious advantage of visual (though there are at least a few YouTube videos with static presentation - a picture or line of text - and full audio in one or another African language). What's missing as far as I can tell is a way to find videos in specific languages that does not rely on the producer having tagged it appropriately (which may not happen).
Mobile devices have changed how we access and interact with content, and consequently how content is designed and even conceived. They also have become the most common way for Africans in general to access the internet - proportionately more important I believe than in any other continent. What I don't have a sense of is how much content in African languages is developed with mobile devices in mind. On the other hand the input limitations for some writing systems would certainly be an issue for use of some African languages in messaging for example.
"Promoting Content in Africa," 2016
ISOC's recent report includes a look at structures to support development of content in Africa, including in African languages. It is encouraging to note the attention ISOC is giving in this report to the importance of African language content for internet use in Africa.
One of the recommendations ISOC has for promoting local content in Africa, including that in African languages, is to promote development of local infrastructure, including data centers, Content Delivery Networks, and Internet Exchange Points. This idea to in effect create a facilitating environment for creation of African language content is an interesting strategy, and would complement other efforts such as mentioned above.
1. The direct link to the post in the AWCO archives is apparently accessible only to subscribed group members. I've created an alternative presentation of it on my website. That post has more background.
2. An early consideration of audio and web content on AWCO mentions Native American interest in the topic, as well as a project in Mauritania (also available on my website).
3. Numerous transcriptions of histories and tales from before the adoption of current orthographies used systematic notation that generally corresponds directly to characters used today (1:1 or occasionally 2:1). I have encountered this for example in various older materials on Fula and Bambara. Cheick Anta Diop's famous Wolof translations of scientific and European cultural texts into Wolof (1955) similarly used a regular transcription that predated the current standard orthography in Senegal.