Teuken-7B: Europe's Multilingual AI Model Launches with 7 Billion Parameters on Hugging Face
November 26, 2024Teuken-7B is designed as an open-source alternative for researchers and businesses, enabling them to utilize and adapt the model for commercial projects while ensuring data security.
The model was trained on Germany's JUWELS supercomputer, equipped with 3,744 NVIDIA A100 GPUs, providing the computational power necessary for its development.
The OpenGPT-X project is a collaborative effort led by the Fraunhofer Institutes, with contributions from various institutions, showcasing a model of cooperative public research.
Teuken-7B is available in two versions: one for research-only and another under the Apache 2.0 license for commercial use, both offering comparable performance.
The model aims to support AI applications across industries, demonstrating superior performance in multilingual benchmarks, particularly across 21 languages.
The project is set to continue until March 31, 2025, allowing for ongoing enhancements and evaluations of the model.
Initiated in early 2022, the project aims to develop an AI model that aligns with European values, prioritizing data protection and linguistic diversity.
The model is hosted on Gaia-X infrastructure, ensuring compliance with strict European data protection regulations, which is essential for businesses handling sensitive information.
The OpenGPT-X research project has launched the 'Teuken-7B' language model, now available for download on Hugging Face, which is trained in all 24 official languages of the European Union and features seven billion parameters.
Stefan Wrobel, director at Fraunhofer IAIS, expressed optimism that Teuken-7B will be widely adopted across various sectors, addressing the growing demand for transparent and customizable generative AI solutions.
Developed over two years, the project emphasizes energy-efficient training and operation, contributing to advancements in multilingual AI models.
Teuken-7B stands out by incorporating approximately 50% non-English pre-training data, enhancing its performance for multilingual applications compared to other models.
Summary based on 9 sources
Get a daily email with more AI stories
Sources
Tech Xplore • Nov 26, 2024
Multilingual and open source: OpenGPT-X research project releases large language modelDigitec • Nov 27, 2024
European AI on the rise: OpenGPT-X publishes "Teuken-7B"Onlineportal von IT Management • Nov 26, 2024
Teuken-7B: European AI model challenges US giantsMirage News • Nov 26, 2024
OpenGPT-X Unveils Multilingual Open Source Model