Tag Archives: translation

Linguistic Vectors

Our friends at Google have transformed the challenge of language translation from one of linguistics to mathematics.

By mapping parts of the linguistic structure of one language in the form of vectors in a mathematical space and comparing those to the structure of a few similar words in another they have condensed the effort to equations. Their early results of an English to Spanish translation seem very promising. (Now, if they could only address human conflict, aging and death.)

Visit arXiv for a pre-print of their research.

From Technology Review:

Computer science is changing the nature of the translation of words and sentences from one language to another. Anybody who has tried BabelFish or Google Translate will know that they provide useful translation services but ones that are far from perfect.

The basic idea is to compare a corpus of words in one language with the same corpus of words translated into another. Words and phrases that share similar statistical properties are considered equivalent.

The problem, of course, is that the initial translations rely on dictionaries that have to be compiled by human experts and this takes significant time and effort.

Now Tomas Mikolov and a couple of pals at Google in Mountain View have developed a technique that automatically generates dictionaries and phrase tables that convert one language into another.

The new technique does not rely on versions of the same document in different languages. Instead, it uses data mining techniques to model the structure of a single language and then compares this to the structure of another language.

“This method makes little assumption about the languages, so it can be used to extend and re?ne dictionaries and translation tables for any language pairs,” they say.

The new approach is relatively straightforward. It relies on the notion that every language must describe a similar set of ideas, so the words that do this must also be similar. For example, most languages will have words for common animals such as cat, dog, cow and so on. And these words are probably used in the same way in sentences such as “a cat is an animal that is smaller than a dog.”

The same is true of numbers. The image above shows the vector representations of the numbers one to five in English and Spanish and demonstrates how similar they are.

This is an important clue. The new trick is to represent an entire language using the relationship between its words. The set of all the relationships, the so-called “language space”, can be thought of as a set of vectors that each point from one word to another. And in recent years, linguists have discovered that it is possible to handle these vectors mathematically. For example, the operation ‘king’ – ‘man’ + ‘woman’ results in a vector that is similar to ‘queen’.

It turns out that different languages share many similarities in this vector space. That means the process of converting one language into another is equivalent to finding the transformation that converts one vector space into the other.

This turns the problem of translation from one of linguistics into one of mathematics. So the problem for the Google team is to find a way of accurately mapping one vector space onto the other. For this they use a small bilingual dictionary compiled by human experts–comparing same corpus of words in two different languages gives them a ready-made linear transformation that does the trick.

Having identified this mapping, it is then a simple matter to apply it to the bigger language spaces. Mikolov and co say it works remarkably well. “Despite its simplicity, our method is surprisingly effective: we can achieve almost 90% precision@5 for translation of words between English and Spanish,” they say.

The method can be used to extend and refine existing dictionaries, and even to spot mistakes in them. Indeed, the Google team do exactly that with an English-Czech dictionary, finding numerous mistakes.

Finally, the team point out that since the technique makes few assumptions about the languages themselves, it can be used on argots that are entirely unrelated. So while Spanish and English have a common Indo-European history, Mikolov and co show that the new technique also works just as well for pairs of languages that are less closely related, such as English and Vietnamese.

Read the entire article here.

Language Translation With a Cool Twist

The last couple of decades has shown a remarkable improvement in the ability of software to translate the written word from one language to another. Yahoo Babel Fish and Google Translate are good examples. Also, voice recognition systems, such as those you encounter every day when trying desperately to connect with a real customer service rep, have taken great leaps forward. Apple’s Siri now leads the pack.

But, what do you get if you combine translation and voice recognition technology? Well, you get a new service that translates the spoken word in your native language to a second. And, here’s the neat twist. The system translates into the second language while keeping a voice like yours. The technology springs from Microsoft’s Research division in Redmond, WA.

[div class=attrib]From Technology Review:[end-div]

Researchers at Microsoft have made software that can learn the sound of your voice, and then use it to speak a language that you don’t. The system could be used to make language tutoring software more personal, or to make tools for travelers.

In a demonstration at Microsoft’s Redmond, Washington, campus on Tuesday, Microsoft research scientist Frank Soong showed how his software could read out text in Spanish using the voice of his boss, Rick Rashid, who leads Microsoft’s research efforts. In a second demonstration, Soong used his software to grant Craig Mundie, Microsoft’s chief research and strategy officer, the ability to speak Mandarin.

Hear Rick Rashid’s voice in his native language and then translated into several other languages:

English:

Italian:

Mandarin:

In English, a synthetic version of Mundie’s voice welcomed the audience to an open day held by Microsoft Research, concluding, “With the help of this system, now I can speak Mandarin.” The phrase was repeated in Mandarin Chinese, in what was still recognizably Mundie’s voice.

“We will be able to do quite a few scenario applications,” said Soong, who created the system with colleagues at Microsoft Research Asia, the company’s second-largest research lab, in Beijing, China.

[div class=attrib]Read the entire article here.[end-div]