Pushing for NLP for Africa

The Natural Language Processing (NLP) technology that makes engagement across digital channels simple and intuitive for English language speakers is still not serving millions of people who speak African indigenous languages.

This is according to NLP researchers who were participating in a webinar hosted by the Institute of Information Technology Professionals South Africa (IITPSA) and its Special Interest Group on AI and Robotics (SIGAIR).

Johan Steyn, technologist, management consultant and chair of the IITPSA Special Interest Group in AI and Robotics, noted: “Technology has come a long way: where once we used our fingers to type in queries, now we can use our voices and in future we will use our brains to interact with technology. The challenge we face in Africa and South Africa is that most indigenous African languages are not represented in platforms that often originate from North America or Europe.”

There is a lack of resources in African languages to support NLP, said Dr Vukosi Marivate, ABSA Chair of Data Science at the University of Pretoria, Co-Founder of the Deep Learning Indaba, and Data Science and Natural Language Processing Researcher.

“As we saw with the history of search engines, in the mid-2000s, systems were still very primitive and users had challenges in structuring their search keywords to find what they were looking for. Now people can ask search engines questions almost in the same way they might ask another human for information. You now have virtual assistants going from speech to text, interpreting the query and responding with useful information. We have gotten used to this, but much of it is built in English and leaves out much of the world’s languages. To add these indigenous languages there are a lot of foundational issues to overcome. You won’t find an article in Wikipedia on the capital of South Africa in isiNdebele, for example.”

Jade Abbott, Machine Learning Lead at Retro Rabbit and NLP Researcher for African Languages at Masakhane, said: “It is a matter of building from scratch to extend NLP to indigenous languages. So there would need to be a data creation exercise, or data archaeology to work with experts such as professors of linguistics to digitise old books – such as books of idioms in indigenous languages. It’s also a matter of trying to find the old books and publishers, and working with translators to create the right data for representative, non-toxic models.”

They said collaboration on creating data resources would enable progress in extending NLP to all languages.

In South Africa, R&D isn’t as funded as it could be,” said Dr Marivate. “You need to have a culture of R&D. because ML is making companies money, money is flooding into academia – but mostly abroad. Multinationals represented in Africa aren’t really investing in R&D on the continent.” He said NLP R&D in Africa could be boosted if universities were connected to each other and worked together, and young innovators were mentored to progress in this space.

The IITPSA hosts a series of Tabling Tech webinars and special intertest groups, which are currently free to attend. These events support life-long learning (LLL) and the continued professional development of individuals in the industry. For more information visit www.iitpsa.org.za