This dashboard tracks inter-annotator agreement for the TQE-2026-1 evaluation project. A team of trained evaluators is working toward a shared agreement threshold by annotating the same translation quality evaluation projects in Label Studio in the domain of generative AI, using a customized MQM typology. After each project, agreement is measured across five levels: span detection, span boundaries, error category, subcategory, and impact rating; findings are used to calibrate the team for the next training round. Reports are generated from a Jupyter notebook and linked below. The goal is to reach deployable, consistent annotations that can be used to train evaluators, give feedback to translators, and fine-tune LLM-based quality evaluation systems.
Linguistic Assets
Specifications
Overall Specifications
Domain: Generative AI
Language families and variants: Largely Latin American Spanish, U.S. English
Audience - broadly: Spanish and English speakers in Latin America (including the United States)
Correspondence: Whole document adaptation
Glossary & Error Examples
GAI Glossary
GAI terminology, including contradictions in regional terminology to be navigated in translation
This project evaluates a section of a translation of a Spanish-language academic paper on algorithmic governmentality. The source text covers the technical foundations of large language models and contextualizes how AI is being integrated into digital ecosystems across industries.
Content complexities: The paper requires fluency in two specialized registers simultaneously, critical social theory and technical AI vocabulary, each with established terminology in English-language scholarship. The English text shows some syntactic awkwardness that raises questions about whether machine translation was used and on the accuracy of the terminology choices.
This project evaluates a translated secion of the WIPO Patent Landscape Report on Generative AI, which covers the rapid growth of patent activity following the launch of ChatGPT, covering technological drivers and the expanding range of real-world applications.
Note: A target text was produced internally for the purposes of this project.
Content complexities: High density of technical acronyms (GANs, VAEs, LLMs, MLLMs); proper nouns requiring no translation (model names, company names, product names); ranked list structures with enumerated claims; genre-mixing of legal/IP register with technology journalism register
This text is legal commentary from a Mexican law firm examining how the Federal Law on Copyright (LFDA) fails to address AI-assisted authorship, with two proposed institutional reforms. We evaluated an adapted version of the original Spanish.
Note: We did not use the English translation provided within a PDF version of this content. Instead, a target text was produced internally for the purposes of this project.
Subdomains: Intellectual property, Mexican law, AI and copyright
Text type: Legal commentary
Purpose: Informative
Author perspective: Legal professional
Audience: Government, Policy makers
Content complexities: The text blends legal, academic, and essayistic registers that must remain distinguishable throughout. Rhetorical question sequences function as cumulative discursive units rather than independent sentences. The Dark Forest Theory and Turing Test/TTR are introduced as analogies and then recalled as structural through-lines, requiring consistent terminology across distant passages to preserve their rhetorical weight.