Artificial intelligence, AI developers across Africa face a critical challenge: training AI systems to understand thousands of local languages when digital resources barely exist for most of them.
The continent is home to an estimated 1,500 to 3,000 languages, yet only 42 are currently supported by language models. This technological gap threatens to exclude millions from the AI revolution, according to researchers working to bridge the divide.
Training AI requires vast amounts of written material. English has over 7 million Wikipedia articles available for AI training. By contrast, Tigrinya—spoken by roughly 9 million people in Ethiopia and Eritrea—has just 335 articles. Akan, Ghana’s most widely spoken native language, has none.
Vukosi Marivate, a computer science professor at the University of Pretoria who led South Africa’s contribution to the African Next Voices project, points out that this disparity stems from economic factors rather than linguistic importance. There are more Swahili speakers than Finnish speakers, yet Finland represents a more lucrative market for tech giants like Apple and Google, explains Chinasa T. Okolo, founder of research institute Technēculturǎ.
Building Solutions from Scratch
The African Next Voices initiative spent two years recording 9,000 hours of speech across 18 languages in South Africa, Kenya, and Nigeria. Researchers worked with native speakers of different ages and regions, sometimes providing scripts but often simply giving prompts and recording natural responses.
For Isindebele, a language spoken in South Africa and Zimbabwe, finding written materials proved so difficult that researchers turned to a government manual for goat herders to help develop their prompts.
While this data collection cannot support a comprehensive large language model like ChatGPT or Gemini, the team focused on crucial sectors, including health and agriculture. These specialized models, built on smaller datasets, can achieve high accuracy within their specific domains.
Nyalleng Moorosi, a research fellow at the Distributed AI Research Institute, emphasises that cultural understanding is essential for AI development. Words and symbols carry different meanings across cultures—the St George’s cross, for instance, has political connotations in the UK that wouldn’t be obvious to someone from Ghana or Lesotho.
A study by her institute revealed that social media platforms failed to detect hate speech related to ethnic violence in Ethiopia, partly because automated systems and human moderators didn’t recognise the slang terms being used.
Moorosi advocates for AI accessibility in all languages, regardless of speaker population size, arguing that every language deserves representation or preservation.
Technical Hurdles Ahead
Beyond data scarcity, developers face additional obstacles. Many African languages lack standardized spelling rules or grammatical documentation. In Kinyarwanda, Rwanda’s national language, the country’s name can be spelled three different ways: uRwanda, Urwanda, and u Rwanda. This inconsistency complicates even basic text processing.
Infrastructure presents another barrier. The African Union reported in 2024 that only 10% of the continent’s data center demand was being met, creating a significant bottleneck for AI development.
The Stakes Are High
Without expanded language model development, millions of Africans risk being excluded from opportunities as the continent builds its AI infrastructure, Okolo warns. Marivate worries that smaller languages could disappear entirely if models aren’t developed for them.
The African Next Voices project has completed its initial data collection and transcription phase. While not currently working on additional languages, Marivate is already considering which languages might be tackled next in the ongoing effort to ensure Africa’s linguistic diversity survives into the digital age.

Funmilola Faleye is a Digital Marketing Specialist, with SEO and word press proficiency, also, she is an Artificial Intelligence (AI) Enthusiast, Personal Branding, and Public Relations Manager. She writes everything tech and general pop culture. She sees and talks with her pen.


















