Difference between revisions of "The WFC Translation memory format Wordfast Classic"
(Created page with "A Wordfast translation memory is a tab-delimited text file. It's the simplest of all formats - it can be opened with text editors, like Notepad, or unicode-compliant word proc...") |
|||
Line 18: | Line 18: | ||
|User ID | |User ID | ||
(Attribute #1) | (Attribute #1) | ||
− | |YAC | + | |YAC |
− | |Initials | + | |Initials of the TU's creator. |
|Optional field: can be empty | |Optional field: can be empty | ||
|- | |- | ||
|Counter | |Counter | ||
|5 | |5 | ||
− | |A number between 0 and 9999 that records how many times this TU was proposed as a 100% match and accepted, | + | |A number between 0 and 9999 that records how many times this TU was proposed as a 100% match and accepted, meaning, re-used, as it is. |
|Optional field: can be empty | |Optional field: can be empty | ||
|- | |- | ||
|Source language | |Source language | ||
|EN-US | |EN-US | ||
− | |TMX-compliant language code (but case-insensitive with | + | |TMX-compliant language code (but case-insensitive with WFC). It is made of a two-letter ISO language code, and optinally, a dash followed by a two-letter local variant. |
|Optional field: can be empty. | |Optional field: can be empty. | ||
Rule: field cannot be longer than 5 characters. | Rule: field cannot be longer than 5 characters. |
Revision as of 20:08, 5 November 2017
A Wordfast translation memory is a tab-delimited text file. It's the simplest of all formats - it can be opened with text editors, like Notepad, or unicode-compliant word processors, as well as with Excel. Wordfast TMs can be regular ANSI (8-bit) text, or Unicode UTF-16 (both little-endian and big-endian).
A Translation Memory (TM) is a set of lines (paragraphs) of text. In a pure text file where the display does not wrap, lines are paragraphs. The very first line is a header, and all other lines are TUs (Translation Units), sometimes called "entries". Lines/Entries/TUs are sets of fields, a field being any text (even lack of text, which denotes an empty field) followed by a tabulator. In other words, the Wordfast TM format is Tab-delimited Text, which is arguably one of the oldest, most robust, open, easy to manipulate data format ever. In the header (the very first line in a TM), each field begins with a % (per cent) mark.
Fields making up a TU:
Field | Example | Format | Remark |
Date | 20041231~165410 | yyyymmdd~hhmmss - the example here means 31 December 2004, at 16:54:10, local time. See note on the tilde ~ character further below. | Optional field: can be empty |
User ID
(Attribute #1) |
YAC | Initials of the TU's creator. | Optional field: can be empty |
Counter | 5 | A number between 0 and 9999 that records how many times this TU was proposed as a 100% match and accepted, meaning, re-used, as it is. | Optional field: can be empty |
Source language | EN-US | TMX-compliant language code (but case-insensitive with WFC). It is made of a two-letter ISO language code, and optinally, a dash followed by a two-letter local variant. | Optional field: can be empty.
Rule: field cannot be longer than 5 characters. |
Source segment | Red Riding Hood was walking in the woods. | The source segment. Maximum size: 8000 Unicode characters. | Should contain at least one character. |
Target language | FR-FR | Language code, TMX-compliant | Optional field: can be empty.
Rule: field cannot be longer than 5 characters. |
Target segment | Le Petit Chaperon Rouge se promenait dans les bois. | The target segment. Maximum size: 8000 Unicode characters. | Optional field: can be empty |
Attribute #2 (optional) | EL | A mnemonic (maximum length=64 characters; no space allowed) for user-defined attribute #1. See Wordfast's "Sample" attributes. | Optional field: can be empty+tabulator omitted |
Attribute #3 (optional) | PS | Optional field: can be empty+tabulator omitted | |
Attribute #4 (optional) | Optional field: can be empty+tabulator omitted | ||
Attribute #5 (optional) | Optional field: can be empty+tabulator omitted |
Here are the first two paragraphs (the TM's header and first Translation Unit) of a TM where the TU is defined as in the table above. Paragraphs are long, so they may wrap in your display - but there are only two paragraphs:
%20041231~160445 | %YAC, Yves A. Champollion | %TU=00000000 | %EN-US %Wordfast TM v5.0 | %FR-FR | %87412764 | ||
20041231~165410 YAC | 5 | EN-US | Red Riding Hood was walking in the woods. | FR-FR | Le Chaperon Rouge se promenait dans les bois. | EL | PS |
When reading a TU, Wordfast defaults on the side of optimism in case the TU does not look correct or canonical. When in a TU:
- the date is missing: if Wordfast is executing a loop that parses TUs, then it will take the previous TU's date and increment it with one second, otherwise, it will take the local machine's current date and time;
- the user ID is empty, Wordfast will assume the TM header's user ID. If it is missing, Wordfast will use the user's identity as defined in Ms-Word. If it is missing, Wordfast will use XX;
- a language code is missing or incorrect - but less than 6 characters: Wordfast will use the current TM's header language code (the code in the first line of the TM).
Fault detection (Wordfast considering that a TU is a bad one) is based on counting how many tabulators are in a line of text. A line of text with less than 6 tabulators cannot form a valid TU. Another fault-detection method used by Wordfast is that language codes should not be no longer than 5 characters. When language codes of more than 5 characters are encountered during a TM reorganisation, it is an indicator that something is amiss with that particular TU, and it is assumed to be faulty.
Remarks:
- The date does not necessarily have a tilde (~) separating date and time. Any printable character can be used there, except a number. Wordfast uses the tilde (~), and the equal (=) sign. The equal sign, in the Wordfast editor, means the TU was "marked" (flagged). This has no consequence at all on the TU's status: it remains fully valid. Although Wordfast always records the date and time when writing a TU, the date and time are optional and could be empty (or even made of an invalid date) in which case Wordfast would simply assume the current date and time. All dates and times are "local", taken from the local computer's clock.
- If any optional field is left empty, its trailing tabulator should be present. For a TU to be valid, there must be at least six tabulators, with the fifth field (the source segment, located between the fourth and the fifth tabulator) made of at least one printable character.
- The date's first character (a number from 0 to 9, usually, a number 2 if the TU was created in the current millenium) can appear to be "x". This means that this TU is not valid anymore. The first full reorganisation of the TM by Wordfast will erase this TU. Do not remove the "x", or replace it with a number, unless you know what you are doing.
Back to Wordfast Classic User Manual