August 18, 2004  ·  Tim Wu

So here’s how this week’s topics connect. In response to the Balkanization point, people in commentary have been writing on the need for a better way to overcome language barriers. As Jeff Licquia put it: “One word: Esperanto.”

Believe it or not, the P2P VoIP program Skype happens to offer Esperanto as a language choice.

Skype lets you search for other Esperanto speakers. Do so and you will find listed none other than the great Chris Libertelli, senior legal advisor to Michael Powell.

Result: You can use P2P VoIP to speak to the FCC in esperanto about its approach to digital audio. Isn’t technology wonderful?

Chris Libertelli

  • Mortazavi

    An artificial language without an accompanying, living and embodied culture is highly unlikely to survive as a vehicle for free and authentic expression of ideas. It is no better than automatic translation. It will remain a curiosity for those who have trouble connecting to real people, with real living and historical languages that carry the golden baggage of culture and tradition. Can you think of English without its master poets and philosophers of the past centuries embedded in a huge body of literature? The multi-lingual, multi-cultural solution is the approach that fosters, encourages, enriches and is enriched by diversity on the web.

  • Brian

    Automated Translation: I don’t see how this can be done with all the infinite variables that come along with language and expressing oneself. I do not see how we can keep up with ourselves on the new words and ideas that pour out every day. Translation is more than just finding the matching word by definition in the other language. The use of that word with other words can effect what they mean, regardless of what the dictionary says. With that point, which dictionary do you use?

  • Mark Kraft

    I’ve actually given this problem a lot of thought in the past, and I think I have a solution that, although brute force, is arguably the best one available.

    Machine translation is, generally speaking, proprietary and woefully inadequate. What is needed is a major open source translation application that leverages the power of many thousands of users to create a massive translation database. The application would need to start from some sort of baseline of usefulness for those who would use it, but it also needs to learn from and be taught by its users. Think of it from a perspective of something like CDDB — it’s would be an application designed to create an enormous database, in this case, not of correct, user-verified album information, but of language itself… a database that could never be built by one person alone.

    So, why would people offer up their assistance to build this database? Well, imagine if, when you registered with the program/site, you filled out your language proficiencies. You might rate yourself a 10 (native) in English and a 7 in Spanish. You want to improve your Spanish skills, however, so the appliation could do several things to help you, such as allow you to exchange correspondence with others who have better skills in Spanish than you do — ideally native speakers who want to improve their English skills. You could write in Spanish, they could write in English. You correspond via the application, and when you encounter a sentence that doesn’t make sense as written, you can either correct the text or refer it to someone(s) else to fix who are suitably skilled. Alternately, you can approve the text. All this information could be added to the database, thereby making the program learn.

    Likewise, you could use the program in a “solitaire mode”, where you could, for instance, learn vocabulary, possibly with words in a pictogram -flashcard kind of way. (Audio could also be added into the project, eventually.) If you are moderately skilled, you could also be given sentences to translate in order to improve your language skills — these sentences could be ones that other users requested translations for. The translations would then be sent on to the people in question, in order to improve their ability to properly read and translate the language. If all flagged blocks of text are used up, the application could even pull text off of the internet in that language and offer them up for translation too.

    Now, this is really just one example of a piece of software that could be part of the same project. The data collected is the goal, whereas it could be used in many, many different ways with different applications. Many people might experience the project by using the software to translate a website — such a task could be triggered done with a bookmarklet or plugin in someone’s browser, for instance. Even that information, however, can be returned in a format where people can flag or correct bad elements within the translation, thereby increasing the application’s knowledge.

    One way you could improve translations further would be to not only have an arbitrary 1-10 rating on languages, but also have some kind of computer reputation system, where people are prompted to review other people’s translations and rate their quality. Good translators would earn higher grades from the computer. Technically, the solftware could even be used to evaluate student proficiency and learning in languages, or evaluate the language skills of those seeking professional work.

    Now, my concern on all this is technical. How big of a database would something like this be, and would it run fast enough on the web? Would it be centralized, or distributed? Should it be a desktop app, or should it be on the web… or both?! Is there, from a technical perspective, a “sweet spot” that balances translation quality and translation speed, or is it better to be as accurate as possible, based on future expectations of speed improvements? Does creating a huge translation database necessarily slow a search for a proper translation, or will blocks of text be more easily translated than through mechanical translation methods, as identical blocks of text had already been translated in the past? Could the wealth of text on the web or web searches themselves be of value to an application that translates text? For instance, if you were to translate “I love to plant flowers.” into Spanish, would a Google search of the sentence or its fragments hint towards a preferred or alternate translation? Could it suggest these alternative translations when it’s not so certain how to translate a block of text, so that readers could choose the most appropriate one and help the database learn as it goes?

    These are ideas that I think a lot about. I don’t know the answers for sure, but the technical restraints are lessening every year. I would love to find others who would also like to make something like this a reality, and I think it should be a major goal for the open source movement to bring about a truely serious open source initiative for translation. Frankly, commercial software initiatives are poorly suited for this task, I believe that open source are the ones to do it.

  • Matthew

    On automatic translation: language doesn’t have an infinite number of variations, or even if it does is theory, the vast majority of them will never be used. Common words, phrases and contructions make up the majority of written and spoken language, and adequate translation of these would be remarkably useful – most people use less than a thousand unique words in their day-to-day writing.

    Secondly, one of the current major goals of machine translation isn’t to produce a perfect output text, but rather one that’s editable by a human who’s less than perfectly fluent in the source language, making translation simpler and more accessible.

    As for dictionaries, you’re entirely correct, picking the right word sense out of a dictionary is a very difficult task. Which is why state of the art machine translation doesn’t make much use of dictionaries, at least not in the sense you seem to mean (find word A in English, output word A in French). The major tool is parallel text – the same text in two different languages (like the Canadian parliamentary records, which are kept in both French and English). This provides words in the actual context of their use, which allows much more sensitive translations.

  • Marcos

    1. Esperanto – it has a culture, and native speakers, and a literature, and all that stuff. What it lacks is motivation on the part of most of the world’s speakers to learn it. :)

    2. Machine Translation – we’ll have truly effective MT when we have genuine AI, and not before, although we may be able to fake it a bit better than we can now before then. See, e.g, Le Ton Beau de Marot for many many thoughts about translation in the microworld of a single French poem.

  • Alexander Wehr

    Whle corporatization brigns with it far more social malities than boons, including government corruption, erosion of individual rights, and various other severe breaches of ethics all for the mighty dollar, one thing that it may very well bring as a benefit is a common commerce language.

    In many nations english is THE language of commerce. As globalization continues, the world’s wealth gap will widen and most certainly the masses will be stripped of more and more rights for the sake of profit, but there will also be a final victor as to which langugae becomes the language of commerce. That language may very well become the next latin.

    Unfortunately, i do not consider this to be nearly enough of a compensation for the utter strip mining of worldwide culture, creativity, and free thought.

  • Jeff Licquia

    The multi-lingual, multi-cultural solution is the approach that fosters, encourages, enriches and is enriched by diversity on the web.

    Well, sure–if your language is popular. If not, you find yourself unable to share in that diversity, since no one can read what you write. Thus, you find yourself forced to write in a non-native language.

    And before you can do any writing, you have to spend years and years learning that non-native language well enough to not sound ridiculous.

    And after spending years enculturating yourself in the culture of some internationally used (read: Western European) language, how much of a representative of your original culture are you really going to be?

    This is where Esperanto’s relative lack of strong culture and its ease of learning are major advantages.

  • M. Mortazavi

    On Jeff’s response, above, to my earlier note, on top of this series of comments: To become truly multi-cultural, you actually have to live the lives of different cultures. There are people who do that and who have no trouble crossing linguistic boundaries because they have lived both sides, if not more than two sides. Young people, children in fact, do it all the time. We just don’t foster it as a society, and we should have arrangements that encourage such living at a global level. Where I live, there are already children of Western European and South Asian descent, who are going to bilingual public-private elementary schools that cater to an already large Chinese population. Many people in California already speak Spanish. Also, to appreciate other cultures and languages, one doesn’t necessarily need to be a very competent writer in multiple languages although there are many who can do this. (I can read in 5 different languages, 3 of them quite competently, and can write competently in 2 and qutie badly in the rest.) Competency in reading can be accomplished in an even larger set of languages than competency in writing. So, opportunities for diversity are much better and more rewarding than the opportunities for a uniform global language lacking living communities of its own. How many people have only spoken esperanto all of their lives? I tried to learn it and spoke it to some friends for a while. It was a curiosity but it is still Indo-European in its structure. I doubt someone who’s not tried to learn or read or write Chinese and has learned it and has lived it could translate (not literally, but figuratively, speaking) that living, that history, that tradition in full into this other language. Poets learn other languages because they know the difficulty of translating the music of each language into the other.

  • Anonymous

    So how can I meet Chris Libertelli?

  • Jeff Licquia

    It appears that the Lessig Blog has a “content problem” of some kind, so I will post my response to Mr. Mortazavi on my own blog. It appears that he has commented further on his own blog; people may be interested in that, too.

  • Tim Wu

    What is a “content problem?”

  • Jeff Licquia

    That was how the blog software described my post when it refused to add it to the comments page. I’m assuming I tripped some sort of anti-spam system.