Mar 8 • 19:30 UTC 🇪🇪 Estonia ERR

What AI understands about the Estonian language and culture

A recent experiment by ERR's Novaator revealed that major language models struggle to grasp the nuances of the Estonian language, raising concerns about data sharing and copyright.

ERR's Novaator conducted an experiment to evaluate the understanding of Estonian language and culture by various AI language models. The findings indicated that these models, particularly those from large corporations, lack an adequate grasp of the intricacies and unique aspects of the Estonian language. For instance, when queried about one of the most famous lines from Estonian literature, the AI provided an unexpected and nonsensical answer. This highlights the challenges faced by AI in interpreting culturally significant phrases and references.

The Novaator team tested the free versions of five major language models using a questionnaire tailored to Estonian linguistics and cultural specifics. They assessed the AI's knowledge on topics ranging from the content of Lennart Meri's "Hõbevalge" to the number of vowels in the word "jäääär". Notably, the Grok model performed the best, followed closely by Claude Sonnet, Gemini, and ChatGPT, demonstrating some level of competency but still revealing limitations in cultural comprehension.

The implications of this study are significant, especially concerning data sharing with chatbots and related technologies. By utilizing Estonian-language data, critical questions surrounding copyright and data protection arise, underscoring the need for careful consideration in how data is sourced and utilized in crafting AI models. It suggests a potential gap that needs addressing to ensure that AI systems can effectively represent and understand different languages and cultures across the globe without infringing on local rights and regulations.

📡 Similar Coverage