A team of African researchers has unveiled African Next Voices, the largest known AI-ready dataset for African languages, aiming to bridge one of the continent’s most significant barriers to digital participation: language exclusion.
Funded by a $2.2 million Gates Foundation grant, the initiative brought together linguists, computer scientists, and AI experts across Kenya, Nigeria, and South Africa to record 9,000 hours of real-life speech in 18 African languages, including Kikuyu, Dholuo, Hausa, Yoruba, isiZulu, Tshivenda, and Sesotho.
The open-access dataset is designed to empower developers to build translation, transcription, and conversational AI tools that reflect the way people live, speak, and interact on the continent.
The Problem: AI Leaves Millions Behind
Despite being home to over a quarter of the world’s languages, Africa’s linguistic diversity remains largely invisible in today’s AI landscape.
Most large language models, including ChatGPT, Gemini, and Claude, are trained predominantly on English, European, and Chinese datasets — languages with abundant digital text. African languages, which are primarily spoken rather than written, lack sufficient online resources, excluding millions of people from benefiting from AI-driven tools and services.
“We think in our own languages, dream in them, and interpret the world through them.
If technology doesn’t reflect that, a whole group risks being left behind,”
says Prof. Vukosi Marivate, University of Pretoria.
African Next Voices: Building the Missing Data
The African Next Voices project recorded diverse, everyday conversations across farming, healthcare, education, and community life to create AI-ready voice datasets representative of real African speech patterns.
“We gathered voices from different regions, ages, and backgrounds so it’s as inclusive as possible.
Big tech can’t always see those nuances,”
explains Dr. Lilian Wanzare, a computational linguist based in Kenya.
By open-sourcing the dataset, the team hopes to unlock innovation for startups, developers, and enterprises building voice assistants, chatbots, and generative AI tools that understand African realities.
AI in Action: Solving Real-World Problems
1. Farming in Setswana with AI-Farmer
In Rustenburg, South Africa, Kelebogile Mosime, a small-scale farmer, uses AI-Farmer, an app supporting Sesotho, isiZulu, and Afrikaans, to diagnose crop diseases, manage pest control, and get farming advice in her home language.
“For somebody in the rural areas like me, it’s useful,” says Mosime.
“When my plants are sick, I ask in Setswana and get solutions instantly.”
2. Lelapa AI: Localising Access to Essential Services
South African startup Lelapa AI, led by CEO Pelonomi Moiloa, is building AI-powered tools for banks, telecom firms, and healthcare providers in African languages.
“English is the language of opportunity.
For many South Africans who don’t speak it, missing English means missing out on healthcare, banking, or even government support.
We’re saying it shouldn’t be this way,” says Moiloa.
Why It Matters
- Digital Inclusion: With over 2,000 African languages, the lack of local-language AI excludes hundreds of millions from digital tools, online education, and financial services.
- Economic Opportunity: Startups like Lelapa AI and apps like AI-Farmer demonstrate the market demand for localized solutions.
- Cultural Preservation: Without AI integration, indigenous knowledge, heritage, and creativity risk being lost in the digital age.
“Language is access to imagination,” says Prof. Marivate.
“It’s not just words — it’s history, culture, and knowledge.
If indigenous languages aren’t included, we lose more than data; we lose ways of seeing and understanding the world.”
The Road Ahead
With African Next Voices laying the groundwork, developers can now:
- Build AI-powered translation and voice assistants in local dialects.
- Support education tools in underrepresented languages.
- Improve access to essential services like banking, healthcare, and agriculture.
However, researchers stress that this is just the beginning. Covering 18 languages is a breakthrough — but with 2,000+ spoken across the continent, scaling efforts will require sustained investment, public-private partnerships, and local innovation leadership.