Emerging AI Groups Fight English Dominance to Preserve Global Languages

October 22, 2024
Emerging AI Groups Fight English Dominance to Preserve Global Languages
  • Emerging research groups like Masakhane, AI4Bharat, and AmericasNLP are working to counter the dominance of English in artificial intelligence by developing language-specific tools that cater to their communities.

  • The current landscape of AI development is heavily skewed towards English, with models trained on significantly more English text than other languages, which leads to real-world disparities.

  • Prominent tech companies often overlook the needs of non-English speakers, relying on machine-translated data that reinforces the dominance of English.

  • These disparities have serious implications for marginalized communities, resulting in issues such as inaccurate medical advice in Hindi and wrongful arrests due to mistranslations in Arabic.

  • To address these challenges, policymakers should prioritize the inclusion of non-English languages in technology development, empowering non-English speaking communities.

  • Support and funding from governments and organizations are essential for these research groups to preserve non-English languages in the digital age.

  • Many language technology initiatives emphasize data ownership and inclusive authorship models, empowering communities to shape AI development.

  • The IndoNLP group, for instance, focuses on data ownership in their NusaCrowd project, ensuring that original contributors maintain control over their datasets.

  • Masakhane employs a non-traditional authorship model that values contributions beyond mere analysis and writing, incorporating the lived experiences of community members.

  • AI4Bharat has developed the IndicVoices speech dataset for 22 Indian languages by engaging diverse community members to accurately capture local language practices.

  • AI4Bharat's tools are already being utilized by significant institutions, including the National Programme on Technology Enhanced Learning and the Supreme Courts of India and Bangladesh.

  • The AmericasNLP initiative is dedicated to revitalizing indigenous languages by creating educational materials aimed at teaching Native American languages.

Summary based on 1 source


Get a daily email with more AI stories

More Stories