Machine Translation (MT)
Machine translation software can use different approaches to translate the input :- Translation memory : is based on an existing resource of bilingual translation data. It stores recurring sentences or phrases and the translations that were made. Later these translations can just be used again. These are also called CAT-programs (Computer Aided Translation)
- Automatic translation (rules based) : the computer analyses the text and translates it using software rules based on vocabulary & grammar rules defined by linguists
- Automatic tranlation (statistical) : the computer is fed with billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. Then statistical learning techniques are applied to build a translation model
- Before a file can be translated automatically or used as input it is recommended to pre-process it. It has to be checked on spelling errors, the type of document has to be analysed, names of persons could be specified.
- The correct spelling of words is important since incorrect spelling can have substantial negative impact an the machine translations output quality. Misspelled words cannot be matched to entires in the dictionaries. As a result, these words go untranslated. If these unknown words are part of the key context words it might have a very negative impact on the overall translation quality as machine decisions concerning the syntactic structure of the sentence may be incorrect as well.
- In Western European languages blank spaces are the primary indicator of a word boundary, in Chinese (and other Asian languages) blank spaces are not used to mark word boundries. As such word segmentation is a more complex task for these languages. To solve this morphological and syntactic information is used during translation.
- Technical text are more appropiate for machine translation since they're less ambiguous. Novels and especially poetry very often have a very poor result.
Links
-
Language processing
Localization Directory, Resources and Information
MT Summit IX
White papers by Systran
Chinesecomputing.com - Machine translation
SYSTRANet (The technology used in Yahoo's Babel Fish).
Organisations
-
EAMT (European Association for Machine Translation)
AMTA (Association for Machine Translation in the Americas
AAMT (Asia-Pacific Association for Machine Translation
[ < Home]