Open Multilingual Language Models

Clean the Sky - Positive Eco Trends & Breakthroughs

Apertus is Trained on 15 Trillion Tokens Across 1,000+ Languages

— December 22, 2025 — Tech
A consortium of Swiss academic and research institutions, including EPFL, ETH Zurich, and the Swiss National Supercomputing Centre (CSCS), has introduced a major new multilingual language model named Apertus. This open-source tool was constructed using an exceptionally large and linguistically diverse dataset — "15 trillion tokens across more than 1,000 languages," to be exact. With 40% of the data being in a language other than English, Apertus places an emphasis on "many languages that have so far been underrepresented in LLMs, such as Swiss German, Romansh, and many others."

The complete technical architecture, the underlying data, and the methodologies used in the creation of the multilingual language model are publicly documented and made available under a permissive license that supports both academic and commercial use. The development process also incorporated specific considerations for regional legal frameworks, such as Swiss data protection laws and the emerging regulations of the European Union's AI Act.

Image Credit: EPFL, ETH Zurich, and the Swiss National Supercomputing Centre

Trend Themes

  1. Hyper-multilingual Language Models — Apertus' use of 15 trillion tokens across over 1,000 languages highlights a trend toward hyper-multilingual language models, creating new possibilities for AI to process and understand a vast array of linguistic contexts.
  2. Open-source AI Platforms — The open-source nature of Apertus, combined with its comprehensive documentation, reflects a growing trend in AI toward transparency and collaboration, encouraging innovation across both academic and commercial sectors.
  3. Regional Legal Compliance in AI — Integration of regional legal frameworks in AI development, as seen in Apertus' adherence to Swiss and EU regulations, demonstrates a trend where AI models are increasingly designed with legal compliance as a foundational feature.

Industry Implications

  1. AI Language Services — The emergence of hyper-multilingual models like Apertus opens new frontiers for AI language services, providing tools that can cater to a global and linguistically diverse user base.
  2. Open-source Software Development — Apertus exemplifies the expanding role of open-source paradigms in software development, inviting contributions and applications from a broader community while reducing barriers to entry for smaller entities.
  3. Legal Tech — AI models such as Apertus that incorporate regional legal considerations are reshaping legal tech, offering platforms that are inherently designed to navigate complex regulatory environments.
5.1
Score
Popularity
Activity
Freshness