This function is used to transfer a text and its translation from a text file to the TM. Import can be done from a raw format, in which an external source text is available for importing into a TM along with its translation. Sometimes the texts have to be reprocessed by the user. There is another format that can be used to import: the native format. This format is the one that uses the TM to save translation memories in a file.
The process of analysis involves the following steps:
It is very important to recognize punctuation correctly in order to distinguish between for example a full stop at the end of a sentence and a full stop in an abbreviation. Thus, mark-up is a kind of pre-editing. Usually, materials which have been processed through translators’ aid programs contain mark-up, as the translation stage is embedded in a multilingual document production line. Other special text elements may be set off by mark-up. There are special elements which do not need to be translated, such as proper names and codes, while others may need to be converted to native format.
The base form reduction is used to prepare lists of words and a text for automatic retrieval of terms from a term bank. On the other hand, syntactic parsing may be used to extract multi-word terms or phraseology from a source text. So parsing is used to normalise word order variation of phraseology, this is which words can form a phrase.
Its purpose is to choose the most useful translation units. Segmentation is like a type of parsing. It is done monolingually using superficial parsing and alignment is based on segmentation. If the translators correct the segmentations manually, later versions of the document will not find matches against the TM based on the corrected segmentation because the program will repeat its own errors. Translators usually proceed sentence by sentence, although the translation of one sentence may depend on the translation of the surrounding ones.
It is the task of defining translation correspondences between source and target texts. There should be feedback from alignment to segmentation and a good alignment algorithm should be able to correct initial segmentation.
It can have as input a previous dictionary. Moreover, when extracting unknown terms, it can use parsing based on text statistics. These are used to estimate the amount of work involved in a translation job. This is very useful for planning and scheduling the work. Translation statistics usually count the words and estimate the amount of repetition in the text.
Export transfers the text from the TM into an external text file. Import and export should be inverses.
When translating, one of the main purposes of the TM is to retrieve the most useful matches in the memory so that the translator can choose the best one. The TM must show both the source and target text pointing out the identities and differences.
Several different types of matches can be retrieved from a TM.
Exact matches appear when the match between the current source segment and the stored one is a character by character match. When translating a sentence, an exact match means the same sentence has been translated before. Exact matches are also called “100 % matches”.
(ICE) match or Guaranteed Match
An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph. Context is often defined by the surrounding sentences and attributes such as document file name, date, and permissions.
When the match is not exact, it is a “fuzzy” match. Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures are not comparable across systems unless the method of scoring is specified.
When the translator selects one or more words in the source segment, the system retrieves segment pairs that match the search criteria. This feature is helpful for finding translations of terms and idioms in the absence of a terminology database.
A TM is updated with a new translation when it has been accepted by the translator. As always in updating a database, there is the question what to do with the previous contents of the database. A TM can be modified by changing or deleting entries in the TM. Some systems allow translators to save multiple translations of the same source segment.
Translation memory tools often provide automatic retrieval and substitution.
TMs are searched and their results displayed automatically as a translator moves through a document.
With automatic substitution, if an exact match comes up in translating a new version of a document, the software will repeat the old translation. If the translator does not check the translation against the source, a mistake in the previous translation will be repeated.
Networking enables a group of translators to translate a text together faster than if each was working in isolation, because sentences and phrases translated by one translator are available to the others. Moreover, if translation memories are shared before the final translation, there is an opportunity for mistakes by one translator to be corrected by other team members.
“Text memory” is the basis of the proposed Lisa OSCAR xml:tm standard. Text memory comprises author memory and translation memory.
The unique identifiers are remembered during translation so that the target language document is ‘exactly’ aligned at the text unit level. If the source document is subsequently modified, then those text units that have not changed can be directly transferred to the new target version of the document without the need for any translator interaction. This is the concept of ‘exact’ or ‘perfect’ matching to the translation memory. xml:tm can also provide mechanisms for in-document leveraged and fuzzy matching.