The Limits of Machine Translation: Why AI Struggles with Literature
When Google Translate was released, in 2006, I was an eighth grader stumbling through introductory Spanish, and my teacher had little reason to worry about her students using it to cheat. It’s almost hard to remember now, but early machine-translation systems were laughably poor. They could give you the general thrust of, say, a Portuguese website, but they often failed at even basic tasks. In one case from 2010, a Google-translated summons reportedly instructed a defendant to avoid court instead of showing up there.
Machine translation didn’t become the juggernaut we know until 2015, when Baidu released its large-scale neural machine-translation system, built with the same basic architecture that chatbots such as ChatGPT use today. Google started switching from a statistical model to a neural system not long after, as did peers such as Systran and Microsoft Translator. It was a major leap forward: Tourists can order coffee and haggle for knickknacks thanks to the magic of Google Translate; I’ve occasionally used Reverso Context, an AI tool, in my own published translations. But still, one area of translation has proved remarkably impervious: literature, which many researchers call the “last bastion” of human translation.
Most studies find that neural machine-translation models can translate only about 30 percent of novel excerpts—usually simple passages—with acceptable quality, as determined by native speakers. They struggle because, at its core, literary translation is an act of approximation. The best option is sometimes not the correct one, but the least bad one. Translators often have to sacrifice literal meaning for the greater good of the piece. But AI is less adept at making such compromises and at landing on creative solutions that, although technically less correct, preserve aspects of a book that are hard to quantify: voice, spirit, sensibility. “You’re weighing different losses and different gains against one another,” Heather Cleary, a literary translator from Spanish to English, told me. A translator has to ask herself: What am I going to really prioritize?
The Last Bastion of Human Translation: Why Books Stump Bots
Daniel Hahn’s recent book, Catching Fire: A Translation Diary, is full of these types of dilemmas. In the book, he walks through his process of translating Jamás el Fuego Nunca, a novel by the Chilean writer Diamela Eltit. One chapter, for example, begins with the following four words: “Frentista, estalinista, asesina loca.” Let’s focus on frentista as a case study. The most literal translation (and the one offered by some AI translators) would be “frontist,” which is basically meaningless in English. Hahn suspects that frentista is meant to be a term for a Chilean leftist, and with a fellow translator’s help, he establishes that it is likely a derogatory term referring to a specific anti-Pinochet guerilla group.
Hahn must ask himself what’s more important in this case: specificity, or maintaining readability and capturing the writer’s voice. He throws around a few options—“paramilitary,” “commie thugs”—before settling on “extremist.” He also switches the order to foreground “Stalinist” (estalinista), giving the reader a sense of what kind of extremist they’re dealing with. Then there’s the problem that Spanish is a gendered language; it’s clear in the original that the speaker is addressing a woman. As a result, Hahn renders asesina loca as “crazy killer bitch.” The final version reads “Stalinist. Extremist. Crazy killer bitch.” It’s imperfect, but it’s also great.
Google Translate, by contrast, suggests “Frontist, Stalinist, crazy murderer.” The sentence is correct, sure, but clumsy, and all but unintelligible to non-Chilean readers. A specialized model like the kind used in most studies of neural machine translation—perhaps one trained specifically on Chilean literature—would certainly fare better. But it’s still hard to imagine one coming up with something close to Hahn’s solution.
The Art of Approximation: Why Literary Translation Requires Human Touch
When you compare human translations with edited machine translations, however, things suddenly get a lot more interesting. In the production of commercial texts—an instruction manual for a printer or a kitchen gadget, say, or even a news article—it’s standard for humans to edit a raw machine translation and then send it to press. This process, which is called post-editing (PE), has been around since long before neural networks started being used for translation. Studies vary, but most conclude that it’s faster and cheaper than translating from scratch.
Since the release of neural models such as those used by Baidu and Google Translate, a body of research has investigated whether the PE process can be applied to literature too. When presented to readers, PE performs comparably in some studies to fully human translations. (So far, most of the research to date has compared European languages, which limits the conclusions that can be drawn from it.)
How well PE fares is influenced by several factors, but in studies, the method tends to do less well with challenging literary works and better with plot-driven novels. Ana Guerberof Arenas, an associate professor in translation studies at the University of Groningen, in the Netherlands, told me that machines are more likely to trip over works with more “units of creative potential”—metaphors, imagery, idioms, and the like. Hahn’s frentista dilemma is a prime example—the more creativity required, the wider the gap between a human solution and a machine one.
Post-Editing: Bridging the Gap Between Machines and Human Translators
Of course, the post-editor can touch up a poor rendition of a challenging passage. But some studies suggest that PE versions are different from fully human ones in subtle, vitally important ways. Antonio Toral, an associate professor at the University of Groningen who frequently collaborates with Guerberof Arenas, explained one example to me: “In translation from scratch, the translator decides where the translation goes from the start. If a sentence can be translated in three main ways, the translator is going to decide.” But in post-editing, “the machine is going to make that decision, and then you just fix whichever of the three the [machine-translation] system has picked.” This reduces the translator’s voice and could result in more homogeneous translations across the literary market.
It could also lead to inconsistent voice within a single translation: Toral told me that in research he has collaborated on, post-editors deviated from the raw machine translation less and less often as they progressed through a work. Recent research led by Guerberof Arenas found that compared with entirely human translations, PE translations are consistently less creative, meaning they depart from literal translations less often and perform less well with those units of creative potential. The differences here are subtle, a question of inches rather than miles. But these subtleties—voice, rhythm, style—are precisely what can separate a functional translation from a great one.
Despite these drawbacks, some European publishers are actively releasing PE titles. Nuanxed, an agency that produces PE translations for publishers, has completed more than 250 books, most of them commercial fiction, since launching two years ago. When I spoke with Robert Casten Carlberg, Nuanxed’s CEO and one of its co-founders, in October, it sounded like Nuanxed was doing well. “The publishers we work with, once they have worked with us, they come back and they want to do more,” he told me. Perhaps that’s because Nuanxed has really nailed human-machine translation; Carlberg described his company’s version as “broader” and “more holistic” than the PE norm, though he was unwilling to discuss specifics. But more likely, I think, is that the quality gap between PE and human translation doesn’t bother the average reader of action-driven commercial fiction. If the customers are happy, it’s easy to see why Nuanxed might not be so concerned about the recent academic research suggesting that PE isn’t optimal.
The Impact of AI on Literary Translation: Challenges and Controversies
The changes in the industry aren’t going unnoticed. “Colleagues are starting to be offered post-editing jobs from the publishing houses that would normally offer them translation jobs,” Morten Visby, a Danish literary translator and the former president of
Analyst comment
Positive news: The Limits of Machine Translation: Why AI Struggles with Literature
Short analysis: While machine translation has made significant progress, it still struggles with the nuances of literary translation. Neural machine-translation models can only translate about 30% of novel excerpts with acceptable quality. Human translators remain essential for preserving voice, spirit, and sensibility in literature. The industry may increasingly adopt post-editing methods, but translators’ roles may become marginalized, impacting their job security and rights. Nonetheless, the challenges of machine translation in literary works highlight the difficulty and value of human translation in preserving the art of literature.