Monday, September 08, 2014

Kamusi at 20: Keeping the vision alive and working

The Kamusi Project, which seeks to provide an open dictionary of all languages, for use in reference and in language technology development, is facing a challenge not unfamiliar to other language-related initiatives: Funding. This effort - sometimes perceived as too ambitious or esoteric but always visionary in its goals and use of technology - is currently campaigning for support through the Global Giving Open Challenge.

Kamusi was originally developed in 1994 as a proposal by Dr. Martin Benjamin and Dr. Ann Biersteker, then both at Yale University's Council on African Studies. Billed as the "Internet Living Swahili Dictionary," its objective was to respond to the need for new reference material on Swahili, and to do so by using the potential user contributions over the internet (this was more than 6 years before Wikipedia was launched). It is worth noting that a reason cited for exploring the internet medium for dictionary development was unfavorable "economics of Swahili publishing." Kamusi is still today an excellent Swahili resource (both monolingual and English <-> Swahili), even as its goals have evolved.

Kamusi was run at Yale, with benefit of US Department of Education funding, until 2006, and during this time was recognized as a finalist in the Stockholm Challenge 2001. At the end of this period, Dr. Benjamin - Martin to those who know him - summarized Kamusi in the context of African languages on the web at Wikimania 2006. He continued to run Kamusi as it transitioned in 2007 from Yale to a server hosted by the World Language Documentation Centre.

Under Martin's direction, Kamusi has since then been incorporated as a non-profit in the US and Switzerland (where he lives with his family), and has expanded its mission beyond Swahili to a pan-African and eventually global scope.

Funding from Canada's International Development Research Centre (IDRC) for Kamusi, as part of the multi-member African Network for Localisation (ANLoc) project, enabled Kamusi to lead development of locales for 100 African languages (locale data facilitates computer software handling a language) and terminology for 12 African languages. Later funding from the US National Endowment for the Humanities (NEH) enabled work on a pilot for Kamusi's multilingual model (basically, there's a lot more to a multilingual dictionary than words in parallel, since concepts don't line up neatly across languages).

Since the conclusion of major funding in 2012, Kamusi has continued work on the multilingual model, including how to annotate degrees of separation (when a concept is translated through another language), homophones, multi-word expressions (something I personally wish machine translation had been better at years ago), and data input from any language. In 2013 Kamusi's work gained it recognition as a launch partner in the White House Big Data Initiative.

Although Kamusi has an affiliation with l'École polytechnique fédérale de Lausanne (EPFL) since last September, this has not filled the funding gap to enable completion of the programming work necessary to bring all of this to fruition and take the Global Online Living Dictionary (GOLD) from a proven pilot project to a full-scale reality.
 
Looking at Kamusi's history - which is long in internet terms - one is impressed by the thought and effort that has gone into it, by Martin and by a range of other contributors, from its beginning at Yale to recent collaborations and donors, with many individual contributions all along. It would be a shame if current funding difficulties would cause this important work to end.

For something like Kamusi, it helps, I think, to look as far ahead as we can look back. Twenty yeas from now, the advantages of building language resources for the many languages that don't have the economic or political/policy weight to get commercial and investor attention - even if they have demographic importance (keep in mind how quickly Africa's population is growing, for instance), but especially if those numbers aren't there either - will be a lot more apparent than they seem today. For countries where many of these languages are spoken, like most of those in Africa, there is a long-term need for projects like Kamusi that connect high level language technology with less-resourced and often low-status languages - and in Kamusi's case, also link those with the more widely spoken international languages.

At this point, Kamusi's effort to gain enough support to qualify for ongoing listing on Global Giving is an attempt to keep the organization going at a critical period in its history. Please consider helping.

2 comments:

Martin Benjamin said...

Thanks for this insightful look at the project! I spent many hours this past week putting together a video that explains a bit about the underlying principles of the Kamusi system - well worth watching!:

http://youtu.be/XJLaqwZkBK0

Don said...

Thanks Martin. The video you mention on Molecular Lexicography is indeed worth the watch. It is especially useful to have such a way of building data on less-resourced languages (and cross linking them) for language technology work.