Nigeria’s Multilingual AI Tool to Boost Digital Inclusion
Introduction
When the Nigerian government announced plans to develop a multilingual AI tool to enhance digital inclusion, 28-year-old computer science student Lwasinam Lenham Dilli was thrilled. Dilli had struggled to find datasets to build a large language model (LLM) in his native Hausa language for his final-year project at university.
Challenges and Importance of Local Language LLMs
Dilli faced difficulties finding clean data in Hausa and English online. He needed these texts to create an LLM for AI chatbots. He believes that creating local language LLMs ensures that local dialects and languages are included in the AI ecosystem, preventing them from being forgotten.
Despite the global excitement around AI tools like OpenAI’s ChatGPT, many advanced systems struggle with African languages like Hausa, Amharic, or Kinyarwanda. This leads to nonsensical responses, highlighting the need for more inclusive AI development.
Nigerian Government Initiative
Nigeria’s Digital Economy Minister, Bosun Tijani, announced that the new LLM would be trained on five low-resource languages and accented English. The government will partner with Nigerian AI startups and collect local data from volunteers fluent in Yoruba, Hausa, Igbo, Ibibio, and Pidgin. The project will also leverage the expertise of over 7,000 fellows from Nigeria’s tech talent programme, a government scheme to train people in coding and programming.
Silas Adekunle, co-founder of Awarri, an AI startup involved in the initiative, mentioned the challenges of creating an AI tool that understands Nigeria’s diverse linguistic and cultural landscape. Despite limited resources, the team is using creative methods to train the model, gather data, and label it efficiently.
Bridging the AI Language Gap
Africa is home to over 2,000 languages, yet most are underrepresented online. English dominates the digital space, making up around 50% of all websites. Alongside the Nigerian government’s efforts, African startups are also developing AI tools in languages like Swahili, Amharic, Zulu, and Sesotho.
In Kenya, Jacaranda Health has created the first LLM in Swahili to improve maternal healthcare. Their system, UlizaLlama, built on Meta’s Llama 3, aims to provide personalised responses to expectant mothers, improving the accuracy and speed of their SMS service.
In South Africa, the Masakhane initiative uses open-source machine learning to translate African languages. Lelapa AI has developed VulaVula, a language processing tool for English, Afrikaans, Zulu, and Sesotho.
Data Scarcity and Ethical Concerns
Building LLMs in African languages faces significant challenges, including data availability and ethical concerns. Many African languages have limited data available, unlike high-resource languages like English. Collecting data raises issues of consent, privacy, and compensation, which are not yet regulated in many African countries.
Michael Michie, co-founder of Everse Technology Africa, highlighted the importance of respecting communities that may not want to share their language data. Vukosi Marivate, co-founder of Lelapa AI, stressed the need for guidelines to prevent exploitation and ensure that the development of LLMs benefits the communities they serve.
Open-source initiatives like Creative Commons, which allow creators to share their work with conditions, are not always perfect. Proper reimbursement and acknowledgment of original contributors are crucial to ensure that the development of LLMs respects and benefits African languages and their speakers.