Can machine translation really help minority languages in Europe?
An analysis with value scenarios
Keywords:
minority languages, machine translation, value scenarios, MT, LLMAbstract
Machine translation (MT) has greatly improved its quality in the last decade and has become nearly omnipresent in all aspects of society. Neural MT (NMT) and, more recently, large language models (LLMs) such as the generative pretrained transformer (GPT) have made translations to many languages easily accessible to all users from any phone or computer. However, most MT models are English-centric and only produce good quality results for those languages with great amounts of data. For minority languages, the challenge is often understood as the scarcity of data, although systemic differences between language communities should be taken into account if MT systems for these languages are meant to be really useful. In this paper, we use value scenarios to imagine the systemic impacts for two languages with differentiated sociolinguistic realities: Catalan and Karelian. The goal is to outline the main challenges and potential harms when considering MT for minority languages and to suggest some general guidelines that should be followed in future research and applications.
References
Alonso, J. A., & Thurmair, G. (2003). The Compendium Translator Systems. Proceedings of the Ninth Machine Translation Summit.
Aranberri, N., & Iñurrieta, U. (2024). When minoritized languages encounter MT: Perceptions and expectations of the Basque community. The Journal of Specialised Translation, 41, 179–205. https://doi.org/10.26034/cm.jostrans.2024.4718
Arzoz, X. (2008). Preface. In X. Arzoz (Ed.), Respecting Linguistic Diversity in the European Union (pp. vii–viii). John Benjamins Publishing Company. https://doi.org/10.1075/wlp.2.01arz
Baumgarten, S., & Cornellà-Detrell, J. (2019). Translation and the economies of power. In S. Baumgarten & J. Cornellà-Detrell (Eds.), Translation and the Global Spaces of Power (pp. 11–26). Multilingual Matters.
Bayatli, S., Kurnaz, S., Salimzianov, I., Washington, J. N., & Tyers, F. M. (2018). Rule-based machine translation from Kazakh to Turkish. Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d’Alacant, Alacant, Spain, 2018, ISBN 978-84-09-01901-4, Págs. 49-58, 49–58. https://dialnet.unirioja.es/servlet/articulo?codigo=6474388
Bentivogli, L., Bisazza, A., Cettolo, M., & Federico, M. (2016). Neural versus Phrase-Based Machine Translation Quality: A Case Study. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 257–267. https://doi.org/10.18653/v1/D16-1025
Bentivogli, L., Bisazza, A., Cettolo, M., & Federico, M. (2018). Neural versus phrase-based MT quality: An in-depth analysis on English–German and English–French. Computer Speech & Language, 49, 52–70. https://doi.org/10.1016/j.csl.2017.11.004
Bird, S. (2024). Must NLP be Extractive? Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://www.researchgate.net/publication/384212532_Must_NLP_be_Extractive
Blas-Arroyo, J. L. (2007). Spanish and Catalan in the Balearic Islands. International Journal of the Sociology of Language, 2007(184), 79–93. https://doi.org/10.1515/IJSL.2007.015
Bowker, L. (2009). Can Machine Translation meet the needs of official language minority communities in Canada? A recipient evaluation. Linguistica Antverpiensia New Series-Themes in Translation Studies, 8, 123–155.
Bowker, L. (2021). Translation technology and ethics. In K. Koskinen & N. K. Pokorn (Eds.), The Routledge Handbook of Translation and Ethics (1st ed., pp. 262–278). Routledge. https://doi.org/10.4324/9781003127970-20
Bowker, L., & Buitrago Ciro, J. (2015). Investigating the usefulness of machine translation for newcomers at the public library. Translation and Interpreting Studies, 10(2), 165–186. https://doi.org/10.1075/tis.10.2.01bow
Briva Iglesias, V. (2022). English-Catalan Neural Machine Translation: State-of-the-art technology, quality, and productivity. Revista Tradumàtica: Traducció i Tecnologies de La Informació i La Comunicació, 20, 149–176.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners (No. arXiv:2005.14165). arXiv. https://doi.org/10.48550/arXiv.2005.14165
Carrol, J. M. (1999). Five reasons for scenario-based design. Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers, Track3, 11 pp.-. https://doi.org/10.1109/HICSS.1999.772890
Casesnoves Ferrer, R. (2010). Changing linguistic attitudes in Valencia: The effects of language planning measures. Journal of Sociolinguistics. https://doi.org/10.1111/j.1467-9841.2010.00450.x
Castilho, S., Moorkens, J., Gaspari, F., Sennrich, R., Sosoni, V., Georgakopoulou, Y., Lohar, P., Way, A., Valerio, A., Miceli Barone, A. V., & Gialama, M. (2017, September 16). A Comparative Quality Evaluation of PBSMT and NMT using Professional Translators. Proceedings of Machine Translation Summit XVI: Research Track. https://aclanthology.org/2017.mtsummit-papers.10
Cenoz, J., & Gorter, D. (2023). Second language acquisition and minority languages: An introduction. In The Minority Language as a Second Language. Routledge.
Costa, J., Korne, H. D., & Lane, P. (2017). Standardising Minority Languages: Reinventing Peripheral Languages in the 21st Century. In Standardizing Minority Languages. Routledge.
Costa-jussà, M. R., Cross, J., Çelebi, O., Elbayad, M., Heafield, K., Heffernan, K., Kalbassi, E., Lam, J., Licht, D., Maillard, J., Sun, A., Wang, S., Wenzek, G., Youngblood, A., Akula, B., Barrault, L., Gonzalez, G. M., Hansanti, P., Hoffman, J., … Wang, J. (2022). No Language Left Behind: Scaling Human-Centered Machine Translation. arXiv.Org. https://arxiv.org/abs/2207.04672v3
Council of Europe. (1992). European Charter for Regional or Minority Languages. https://rm.coe.int/1680695175
Crystal, D. (2003). English as a Global Language (2nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511486999
De Schutter, H. (2017). Translational justice: Between equality and privation. In G. González Núñez & R. Meylaerts (Eds.), Translation and Public Policy: Interdisciplinary Perspectives and Case Studies (1st ed., pp. 15–31). Routledge. https://doi.org/10.4324/9781315521770
Dunne, A., & Raby, F. (2001). Design Noir: The Secret Life of Electronic Objects. Springer Science & Business Media.
España-Bonet, C., Labaka, G., Díaz de Ilarraza, A., & Màrquez, L. (2011, September 19). Hybrid Machine Translation Guided by a Rule–Based System. Proceedings of Machine Translation Summit XIII: Papers. MTSummit 2011, Xiamen, China. https://aclanthology.org/2011.mtsummit-papers.63
Extra, G., & Gorter, D. (2008). The constellation of languages in Europe: An inclusive approach. In Multilingual Europe: Facts and Policies. Mouton de Gruyter. https://www.researchgate.net/publication/254799052_The_constellation_of_languages_in_Europe_An_inclusive_approach
Filip Klubicka, Antonio Toral, & Víctor M. Sánchez-Cartagena. (2017). Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation. The Prague Bulletin of Mathematical Linguistics, 108, 121–132. https://doi.org/10.1515/pralin-2017-0014
Fité Labaila, R. (2001). La traducció automàtica aplicada a la premsa escrita. El cas d’El Periódico en català. Treballs de Comunicació, 21–25.
Fité Labaila, R. (2007). Cas d’integració de la TA : el Periódico. Tradumàtica: Traducció i Tecnologies de La Informació i La Comunicació, 4. https://raco.cat/index.php/Tradumatica/article/view/56010
Forcada, M. L. (2017). Making sense of neural machine translation. Translation Spaces, 6(2), 291–309. https://doi.org/10.1075/ts.6.2.06for
Forcada, M. L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Sánchez-Martínez, F., Ramírez-Sánchez, G., & Tyers, F. M. (2011). Apertium: A free/open-source platform for rule-based machine translation. Machine Translation, 25(2), 127–144. https://doi.org/10.1007/s10590-011-9090-0
Friedman, B., Kahn, P. H., Borning, A., & Huldtgren, A. (2013). Value Sensitive Design and Information Systems. In N. Doorn, D. Schuurbiers, I. van de Poel, & M. E. Gorman (Eds.), Early engagement and new technologies: Opening up the laboratory (pp. 55–95). Springer Netherlands. https://doi.org/10.1007/978-94-007-7844-3_4
Gal, S. (2006). Contradictions of standard language in Europe: Implications for the study of practices and publics. Social Anthropology. https://www.academia.edu/59684410/Contradictions_of_standard_language_in_Europe_Implications_for_the_study_of_practices_and_publics
Gal, S. (2017). Visions and Revisions of Minority Languages: Standardization and Its Dilemmas. In Standardizing Minority Languages. Routledge.
Generalitat de Catalunya. (2024). Situació del català, de l’occità aranès i de la llengua de signes catalana i principals línies d’intervenció en política lingüística durant la legislatura.
Gerrand, P. (2019). Catalan’s Presence on the Internet (1993–2018). In P. Casanovas, M. Corretger, & V. Salvador (Eds.), The Rise of Catalan Identity: Social Commitment and Political Engagement in the Twentieth Century (pp. 261–270). Springer International Publishing. https://doi.org/10.1007/978-3-030-18144-4_17
Gorter, D., & Cenoz, J. (2011). Multilingual education for European minority languages: The Basque Country and Friesland. International Review of Education / Internationale Zeitschrift Für Erziehungswissenschaft / Revue Internationale de l’Education, 57(5/6), 651–666.
Hutchins, W. J. (2001). Machine Translation over fifty years. Histoire, epistemologie, langage: HEL, 23(1), 7–32.
Idescat. (2023). Enquesta d’usos lingüístics de la població. https://www.idescat.cat/pub/?id=eulp&n=7195
Islam, Md. A., Anik, Md. S. H., & Islam, A. B. M. A. A. (2022). An Enhanced RBMT: When RBMT Outperforms Modern Data-Driven Translators. IETE Technical Review, 0(0), 1–12. https://doi.org/10.1080/02564602.2022.2026828
Joshi, P., Barnes, C., Santy, S., Khanuja, S., Shah, S., Srinivasan, A., Bhattamishra, S., Sitaram, S., Choudhury, M., & Bali, K. (2019). Unsung Challenges of Building and Deploying Language Technologies for Low Resource Language Communities. In D. M. Sharma & P. Bhattacharya (Eds.), Proceedings of the 16th International Conference on Natural Language Processing (pp. 211–219). NLP Association of India. https://aclanthology.org/2019.icon-1.25
Joshi, P., Santy, S., Budhiraja, A., Bali, K., & Choudhury, M. (2020). The State and Fate of Linguistic Diversity and Inclusion in the NLP World. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 6282–6293). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.560
Kamusella, T. (2009). The Politics of Language and Nationalism in Modern Central Europe | SpringerLink. Palgrave McMillan. https://link.springer.com/book/10.1057/9780230583474
Kasthuri, M., & Kumar, S. B. R. (2014). Rule Based Machine Translation System from English to Tamil. 2014 World Congress on Computing and Communication Technologies, 158–163. https://doi.org/10.1109/WCCCT.2014.50
Kenny, D. (2018). Machine translation. In The Routledge Handbook of Translation and Philosophy. Routledge.
Khan, A. (2023). Language and Globalization: A Critical Study on Language, Culture...: Ingenta Connect. International Journal of English Learning & Teaching Skills, 5(3). https://doi.org/10.15864/ijelts.5305
Khanna, T., Washington, J. N., Tyers, F. M., Bayatlı, S., Swanson, D. G., Pirinen, T. A., Tang, I., & Alòs i Font, H. (2021). Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages. Machine Translation, 35(4), 475–502. https://doi.org/10.1007/s10590-021-09260-6
Kim, Y. J., Awan, A. A., Muzio, A., Cruz salinas, A., Lu, L., Hendy, A., Rajbhandari, S., He, Y., & Awadalla, H. (2021). Scalable and Efficient MoE Training for Multitask Multilingual Models. https://doi.org/10.48550/arXiv.2109.10465
Kuusi, P., Kolehmainen, L., & Riionheimo, H. (2017). Introduction: Multiple roles of translation in the context of minority languages and revitalisation. Trans-Kom, 10(2), 138–163.
Laakso, J., Sarhimaa, A., Åkermark, S. S., & Toivanen, R. (2016). Towards Openly Multilingual Policies and Practices: Assessing Minority Language Maintenance Across Europe. Multilingual Matters.
Lehman-Wilzig, S. (2000). The Tower of Babel vs the power of babble: Future political, economic and cultural consequences of synchronous, automated translation systems. New Media & Society, 2(4), 467–494.
Lozano Sañudo, B. (2023). The role of language technologies and MT in fostering multilingualism by contributing to the revitalization of minor and endangered languages. In Educación, investigación e innovación en la red, 2023, ISBN 9788419544988, págs. 131-149 (pp. 131–149). Aula Magna. https://dialnet.unirioja.es/servlet/articulo?codigo=9527870
Montserrat, M. V. (2023). El projecte AINA, la IA i les tecnologies del llenguatge. Terminàlia, 27, 80–84.
Moore, R. (2015). From revolutionary monolingualism to reactionary multilingualism: Top-down discourses of linguistic diversity in Europe, 1794-present. Language & Communication, 44, 19–30. https://doi.org/10.1016/j.langcom.2014.10.014
Moseley, C. (Ed. ). (2010). Atlas of the world’s languages in danger. https://unesdoc.unesco.org/ark:/48223/pf0000187026
Moshnikov, I. (2022). The use of the Karelian language online: Current trends and challenges. Eesti Ja Soome-Ugri Keeleteaduse Ajakiri. Journal of Estonian and Finno-Ugric Linguistics, 13(2), Article 2. https://doi.org/10.12697/jeful.2022.13.2.09
Mowbray, J. (2017). Translation as marginalisation? International law, translation and the status of linguistic minorities. In G. González Núñez & R. Meylaerts (Eds.), Translation and Public Policy: Interdisciplinary Perspectives and Case Studies (1st ed., pp. 32–57). Routledge. https://doi.org/10.4324/9781315521770
Muehlebach, A. (2001). ‘Making place’ at the United Nations: Indigenous cultural politics at the U.N. Working Group on Indigenous Populations. Cultural Anthropology: Journal of the Society for Cultural Anthropology, 16(3), 415–448. https://doi.org/10.1525/can.2001.16.3.415
Nathan, L. P., Klasnja, P. V., & Friedman, B. (2007). Value scenarios: A technique for envisioning systemic effects of new technologies. CHI ’07 Extended Abstracts on Human Factors in Computing Systems, 2585–2590. https://doi.org/10.1145/1240866.1241046
Nurminen, M., & Koponen, M. (2020). Machine translation and fair access to information. Translation Spaces, 9(1), 150–169. https://doi.org/10.1075/ts.00025.nur
Oliver, A., Vàzquez, M., Coll-Florit, M., Alvarez, S., Suárez, V., Aventín-Boya, C., Valdés, C., Font, M., & Pardos, A. (2023). TAN-IBE: Neural Machine Translation for the romance languages of the Iberian Peninsula. In M. Nurminen, J. Brenner, M. Koponen, S. Latomaa, M. Mikhailov, F. Schierl, T. Ranasinghe, E. Vanmassenhove, S. A. Vidal, N. Aranberri, M. Nunziatini, C. P. Escartín, M. L. Forcada, M. Popovic, C. Scarton, & H. Moniz (Eds.), Proceedings of the 24th Annual Conference of the European Association for Machine Translation, EAMT 2023, Tampere, Finland, 12-15 June 2023 (pp. 495–496). European Association for Machine Translation. https://aclanthology.org/2023.eamt-1.50
Pirinen, F. A., & Wiechetek, L. (2022). Building an Extremely Low Resource Language to High Resource Language Machine Translation System from Scratch. Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022), 150–155.
Pirinen, T. A. (2019). Workflows for kickstarting RBMT in virtually No-Resource Situation. Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages, 11–16. https://aclanthology.org/W19-6803/
Pla, J. T. i. (2000). La Llengua catalana a Andorra: Problemes i perspectives. Treballs de sociolingüística catalana, 165–167.
Pons i Parera, E. (2013). Transición española y pluralismo lingüístico en españa. Espaço Jurídico: Journal of Law, 14(3), 93–112.
Potinkara, N. (2024). Finland-Swedes and the Concept of National Minorities in Sweden. Ethnopolitics, 23(1), 59–75. https://doi.org/10.1080/17449057.2022.2108596
Purason, T., Ivanov, A., Yankovskaya, L., & Fishel, M. (2024). SMUGRI-MT - Machine Translation System for Low-Resource Finno-Ugric Languages. Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2), 2, 31–32.
Rehm, G., & Way, A. (ed. ). (2023). European Language Equality: A Strategic Agenda for Digital Language Equality. Springer.
Salonen, T. (2017). Karelian – a digital language? In Reports on Digital Language Diversity in Europe. http://www.dldp.eu/sites/default/files/documents/DLDP_Karelian-Report.pdf
Sarhimaa, A. (2016). Karelian in Finland: ELDIA Case-Specific Report. https://phaidra.univie.ac.at/detail/o:471733
Sayers, D., Sousa-Silva, R., Höhn, S., Ahmedi, L., Allkivi-Metsoja, K., Anastasiou, D., Beňuš, Š., Bessa, M., Bowker, L., Bytyçi, E., Cabral, L., Catala, A., Çepani, A., Chacón-Beltrán, R., Coler, M., Dadi, S., Dalipi, F., Despotovic, V., Doczekalska, A., … Yildirim Yayilgan, S. (2021). The Dawn of the Human-Machine Era: A Forecast of New and Emerging Language Technologies. https://doi.org/10.17011/jyx/reports/20210518/1
Seoane, L. F., Loredo, X., Monteagudo, H., & Mira, J. (2019). Is the coexistence of Catalan and Spanish possible in Catalonia? Palgrave Communications, 5(1), 1–9. https://doi.org/10.1057/s41599-019-0347-1
Sghaier, M. A., & Zrigui, M. (2020). Rule-Based Machine Translation from Tunisian Dialect to Modern Standard Arabic. Procedia Computer Science, 176, 310–319. https://doi.org/10.1016/j.procs.2020.08.033
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks (No. arXiv:1409.3215). arXiv. https://doi.org/10.48550/arXiv.1409.3215
UN Subcommission on Prevention of Discrimination and Protection of Minorities. (1979). Study on the rights of persons belonging to ethnic, religious and linguistic minorities /: By Francesco Capotorti, Special Rapporteur of the Sub-Commission on Prevention of Discrimination and Protection of Minorities. https://digitallibrary.un.org/record/10387
United Nations. (2023). Why Indigenous languages matter: The international decade on Indigenous languages 2022–2032. www.un.org/development/desa/dpad/wpcontent/ uploads/sites/45/publication/PB151.pdf
Uusitupa, M., Koivisto, V., & Palander, M. (2017). Border Karelian dialects and the terminology of Karelian borderlands’ language varieties. Virittaja, 121, 67–106.
Valtioneuvosto. (2022). Kielipoliittinen ohjelma: Valtioneuvoston periaatepäätös (No. 51; Valtioneuvoston julkaisuja). Valtioneuvosto. http://urn.fi/URN:ISBN:978-952-383-645-7
Vaswani, A., Shazeer, N. M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017, June 12). Attention is All you Need. Neural Information Processing Systems. https://www.semanticscholar.org/paper/Attention-is-All-you-Need-Vaswani-Shazeer/204e3073870fae3d05bcbc2f6a8e263d9b72e776
Videsott, P. (2023). Minority and minoritised languages as part of the European linguistic and cultural diversity | Think Tank | European Parliament. European Parliament. https://www.europarl.europa.eu/thinktank/en/document/IPOL_STU(2023)751273
Vieytez, E. J. R. (2014). Cultural Diversities and Human Rights: History, Minorities, Pluralization. The Age of Human Rights Journal, 3, Article 3.
Wang, H., Wu, H., He, Z., Huang, L., & Church, K. W. (2022). Progress in Machine Translation. Engineering, 18, 143–153. https://doi.org/10.1016/j.eng.2021.03.023
Way, A. (2018). Traditional and emerging use-cases for machine translation. In Translation Quality Assessment: From principles to practice. Springer.
Woolard, K. A. (2008). Language and identity choice in Catalonia: The interplay of contrasting ideologies of linguistic authority. In Lengua, nación e identidad: La regulación del plurilinguismo en España y América Latina (pp. 303–323). Iberoamericana Vervuert. https://dialnet.unirioja.es/servlet/articulo?codigo=5371500
Yankovskaya, L., Tars, M., Tättar, A., & Fishel, M. (2023). Machine Translation for Low-resource Finno-Ugric Languages. Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), 762–771.
Downloads
Published
How to Cite
Issue
Section
License
Este trabalho está licenciado com uma Licença Creative Commons - Atribuição-NãoComercial 4.0 Internacional.
