Monday, March 04, 2013

The vodka is good, but the meat is rotten

The title comes from an old joke, based on reality, that a computer could translate words directly, such as "the spirit is willing but the flesh is weak", an old Biblical phrase, but, crunching it through the machine into several languages would amount to disaster, as in this case, where the saying was crunched into Russian and then back again, yielding a quite different phrase, but one which I like nevertheless.

For hundreds of years people have rested comfortably with the idea that computers can translate words directly, but will always do a poor job with suggested meanings, or metaphoric meanings, as is the case above. These people aren't fully aware of the resources at computers' disposal. These days Google Translate (GT) is at the forefront of natural language processing, and this is interesting, because Google at least has enormous resources to provide that help to change the equation. I'm not suggesting that NLP has become perfect, but rather that it's on a steady progression (approaching 1, we could say) upward, and with each step closer to being able to accurately translate a phrase such as the above, there are adjustments we out here will have to make to that capability.

Right off the bat, a person with a sharp eye could spot that any phrase containing this string: "the spirit is willing but the flesh is weak" is probably the common one that we recognize, and has a metaphorical meaning rather than a specific interpretation involving someone's flesh. That person could simply program into any translation program the meaning of such a string. The machine translator could furthermore warn the writer that 99% of the time people use this expression with a certain metaphorical meaning, but it's not out of the realm of possibility that they are referring to specific flesh. This is in fact how we hear such things. We keep track of the averages and try to keep an open mind to the more literal possibilities.

Thus Google Translate becomes closer and closer to being like us, and calculating the probable meanings of expressions in every language including ours. We laugh at the computer's inability to detect sarcasm, metaphor, shades of meaning, etc. We underestimate the vast pools of data that GT technicians can bumble through. We underestimate the fact that, as one of the biggest viable companies in the world, Google can simply ask its employees, "what does this phrase mean to you?" and run with its answer. In fact, it has considerably more resources than simply the ability to translate each word directly into another language. More on this later.



