📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal’s AMÁLIA, a €5.5 million European Portuguese LLM, is operational but prompts three critical questions about openness, native data sufficiency, and optimization focus. These issues impact national AI policy and European sovereignty efforts.
Portugal’s €5.5 million AMÁLIA language model, a major national AI project, is now operational with a publicly available base version, but it raises three fundamental questions about openness, native-language data, and strategic objectives that remain unanswered.
AMÁLIA was developed through a consortium involving about 60 researchers from Portugal’s top institutions, including NOVA and IST, and was announced in December 2024. The model, built as a continuation of the EuroLLM multilingual foundation, was completed in September 2025, with the base version now accessible via the FCT IAedu platform to approximately 450,000 academic users. It handles Portuguese text, with multimodal capabilities planned for future updates.
Technically, AMÁLIA is not trained from scratch but extends the EuroLLM model, incorporating 107 billion tokens during extended pre-training, including 5.8 billion tokens from Portugal’s web archive Arquivo.pt. It outperforms previous open models on Portuguese benchmarks and surpasses Qwen 3-8B on most tests, though it still lags behind on some specific benchmarks like ALBA. The final version is expected in June 2026, with ongoing development and evaluation.
Critically, Duarte O.Carmo’s analysis highlights three key questions: How open is “fully open” really? How much native-language data is sufficient? And what should be the primary goal of the model? These questions are central to evaluating the project’s strategic and technical success, especially given the model’s public funding and national scope.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.

APRENDA GitLab CI/CD: Implemente DevOps com Deploys Automatizados e Feedback Contínuo (Infraestrutura & Automação Brasil) (Portuguese Edition)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.

AI Engineering: Building Applications with Foundation Models
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.

Large Language Models (LLMs)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.

OpenAI Evals Cookbook: Designing Benchmarks for Product‑Grade LLM Features
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Implications for Portugal’s AI Sovereignty and Policy
The questions raised about AMÁLIA reflect broader concerns across Europe regarding the transparency, data sufficiency, and strategic focus of national LLM initiatives. As governments invest public funds into these models, understanding their openness and native-data reliance is crucial for maintaining sovereignty and competitive advantage. The ongoing debate influences policy decisions, funding priorities, and the future of European AI independence.
European Sovereign LLM Projects and Structural Challenges
Across Europe, countries like Italy, Germany, France, and Norway are developing their own large language models, often with similar strategic questions about openness, native data, and goals. These projects are typically publicly funded and involve academic and research institutions, but they face common issues: how open can these models truly be? How much native-language data is enough? And what should be the primary optimization target? The case of AMÁLIA exemplifies these issues, highlighting a pattern where national efforts are more about strategic presence than technical maturity.
The European sovereign-LLM movement is at a pivotal point, with many projects still in progress and many questions unresolved. The emphasis on public accountability and strategic sovereignty makes these questions more urgent than ever, especially as models like AMÁLIA enter operational phases.
“The three questions are not accusations; they are the structural framework that any honest evaluation of national LLM investment needs to internalize.”
— Duarte O.Carmo
Unanswered Questions About AMÁLIA’s Openness and Goals
It remains unclear how open the final version of AMÁLIA will be, especially regarding access to underlying data and model weights. Additionally, the adequacy of native Portuguese data and the primary objectives guiding the model’s development are still under discussion. The final strategic positioning and technical capabilities will likely evolve before the June 2026 release, but current details are limited.
Next Milestones and Ongoing Evaluations
The final version of AMÁLIA is scheduled for release in June 2026, with ongoing testing, benchmarking, and policy discussions. Researchers and policymakers will closely monitor its openness, native data reliance, and performance across Portuguese benchmarks. The next 12-24 months will be critical for addressing the current questions and defining Portugal’s and Europe’s strategic approach to sovereign LLMs.
Key Questions
What are the main concerns about AMÁLIA’s openness?
It is still unclear how open the final model will be regarding access to weights, training data, and licensing, which impacts transparency and sovereignty.
How much native Portuguese data has been used in AMÁLIA?
Approximately 5.8 billion tokens from Portugal’s web archive were used during training, but questions remain about whether this is sufficient for optimal performance and cultural representation.
What are the strategic goals guiding AMÁLIA’s development?
The project aims to support Portuguese academia and demonstrate national AI capability, but the specific long-term goals, such as openness and commercial use, are still under discussion.
Will AMÁLIA be accessible to the public?
Currently, the base version is available to academic users, but the extent of future public access or licensing remains uncertain as the project progresses.
How does AMÁLIA compare to other European models?
AMÁLIA outperforms several open models on Portuguese benchmarks and is technically comparable to other European efforts, but questions about data, openness, and strategic focus are common across projects.
Source: ThorstenMeyerAI.com