Emerging AI Groups Fight English Dominance to Preserve Global Languages
October 22, 2024Emerging research groups like Masakhane, AI4Bharat, and AmericasNLP are working to counter the dominance of English in artificial intelligence by developing language-specific tools that cater to their communities.
The current landscape of AI development is heavily skewed towards English, with models trained on significantly more English text than other languages, which leads to real-world disparities.
Prominent tech companies often overlook the needs of non-English speakers, relying on machine-translated data that reinforces the dominance of English.
These disparities have serious implications for marginalized communities, resulting in issues such as inaccurate medical advice in Hindi and wrongful arrests due to mistranslations in Arabic.
To address these challenges, policymakers should prioritize the inclusion of non-English languages in technology development, empowering non-English speaking communities.
Support and funding from governments and organizations are essential for these research groups to preserve non-English languages in the digital age.
Many language technology initiatives emphasize data ownership and inclusive authorship models, empowering communities to shape AI development.
The IndoNLP group, for instance, focuses on data ownership in their NusaCrowd project, ensuring that original contributors maintain control over their datasets.
Masakhane employs a non-traditional authorship model that values contributions beyond mere analysis and writing, incorporating the lived experiences of community members.
AI4Bharat has developed the IndicVoices speech dataset for 22 Indian languages by engaging diverse community members to accurately capture local language practices.
AI4Bharat's tools are already being utilized by significant institutions, including the National Programme on Technology Enhanced Learning and the Supreme Courts of India and Bangladesh.
The AmericasNLP initiative is dedicated to revitalizing indigenous languages by creating educational materials aimed at teaching Native American languages.
Summary based on 1 source
Get a daily email with more AI stories
Source
Tech Policy Press • Oct 21, 2024
Local AI Research Groups are Preserving Non-English Languages in the Digital Age