
Native English speakers have a poor name for being backward in coming forward when called upon to communicate in languages other than English, believing that this can be done in any part of the world in their tongue. Andy Way, however, is not one of those – for him the idea of everybody speaking the same language is a miserable one. He responded to our “Good morning” with “Egun on”; the rest of the interview being in English and peppered with ironic comments about the fact; a system of automatic English-Basque translation would have made communication that much easier. After the jokes, the interview.
True, I guess a lot of the history was... MT was investigated for political reasons; so back in the old days the Americans wanted to know what the Russians were saying, and even now there a lot of funding in the US particularly for machine translation of Arabic, pharcy and ordu. It is all security related.
So that’s communication on one level, and there’s communication let’s say between you and me.
It would be much easier, let’s say, if you were on your computer in your office and I was in Dublin, tied be much easier if you can type in Spanish or even Basque and the machine translation system translate that into English for me; and I’d type English translated back. Because it doesn’t matter if it isn’t perfect English, if I can speak English that’s irrelevant. You know, for someone who speaks Basque but doesn’t speak English, is better bad Basque than perfect English. If you can get the general message, the general understanding from a text at one level that’s all we need to do.
So the biggest growth area of MT at the moment is on the individual basis; the clue is that two people may be able to communicate via this on-line systems, without necessarily knowing the same language.
Yes, there is no doubt. Even the systems that are freely available, things like Babel Fish for example, are not very sophisticated systems, but they enable people to communicate in their own languages. And these systems, which there are several on the Internet they are free, they don’t cost anything for anybody, you don’t have to subscribe or be a member.
Literally, they have very high usage, but their quality isn’t very good. So if we can improve the quality of these translation systems, you will expect more and more people will use these.
In Europe there has been a big change. For example, back in the 80s there were only 9 official languages of the EU, and now we have more than 20. All of those languages in the eyes of the EU are equal. So that all of those texts have to be translated into all of those languages. There are over 400 languages pairs now.
Lets say Latvian and Greek, how many translators can translate Latvian into Greek? Not very many.
In the European Commission they have their own in-house version of a MT system called SYSTRAN. It’s not the same SYSTRAN system you’d buy in a shop. It’s for internal consumption. And they use it to do draft translations, and on the hole you steel need a human to post-edit, to correct all the mistakes. If you have to send these documents outside to your customers or to the general public they need to be perfect.
The good thing is that these are just machines. If we want to these (computers) can work 24 hours a they, we can’t work 24 h a day. So the advantage is that can operate much more quickly than we can, but at a lower quality.
People shouldn’t view these things as a replacement activity, it’s just like any other machine, telephone, toaster or, you know, a car. It’s just a machine, which helps as.
Yes, for instance we are in contact with the University of the Basque Country. They work with the English-basque and Spanish-basque pairs. And so, we have added more and more language pairs. So we have people doing Arabic, Chinese, Italian, French, German, Spanish and now Basque.
I have a student doing sign-language translation, from English to Irish sign language in the airport scenario. All the information isn’t on the screens, they might say “Quickly! Andy Way go to gate 50, you are delayed”, and if your are deft, you won’t hear. So we are hoping to operate in that area.
Again, if you can restrict the area of application, it doesn’t matter what language pair you are translating between, it becomes much easier. So if you operate in the domain of airports rather than all of languages many of the everyday problems disappear.
Most of what people do in the research field nowadays, is working on corpus based MT. So what you need is a corpus (lets say between Spanish and Basque). You have many sentences of Spanish and their translations in Basque. Now, for some of the bigger languages you can get what they call parallel corpora, this sentence translates this other sentence. English-French there exist, we can use parliamentary proceedings in Canada, or for English Chinese we can use the Hong Kong parliament data.
But if you want to do lets say English Basque, where do you get corpus from? So, in principle all the techniques we use nowadays can be applied to any language pair. But if you wanted to do Basque Irish, there is no parallel corpus. So that is one of the problems for minority languages. We just don’t have the amount of text available as for Spanish, or English, or French... that’s the biggest problem.
The major challenge really is the quality; the overall quality is still not good enough. And again, while we use these corpus driven approaches nowadays we have said already for certain language pairs, you can’t do translation using a corpus-based approach because the corpora do not exist. For example, Basque Irish.
So one problem is deriving the resources that you need to even begin translation between one pair of languages.
I think one way in which we could make a big impact would be to ... come up with an application which would be used in lots of people’s homes, it could something really base, because in the university in most of the time what we try to solve very hard problems, but there are simple solutions, people in universities typically find it uninteresting. But if you can make an application with those at the most simple, lower level, you really help people’s day-to-day lives, for example with the sign language, that could make a real effect.
If you talk to people they can understand why MT is necessary and useful. Many other computational linguistic problems and applications is very hard to people to understand. But everybody knows what translation is and what computers are, and you know that you would like to improve communication, so I think people in general understand that this is necessary.
The very first question you asked me was about communication, I think spoken language translation will come soon, so you’ll be able to speak in our on individual language, you’ll be able to speak in Basque and I will be asked in English. So, if you want to communicate we can do speech recognition: you’ll hear me speak in Basque on your computer, and I will hear you speak in English in my computer. So I think that will come as well. I think that’s a good thing as well because speech is much more natural. (...)
Author: Nagore Rementeria
Elhuyar Zientzia eta Teknika magazine nº 233
Go to top of pageSponsors:
The use of the contents of this website is forbidden without permission.
Copyright © 2007 Elhuyar Fundazioa