Wordfast Classic User Manual

From Wordfast Wiki
Revision as of 00:23, 29 October 2017 by Samar (talk | contribs) (Translation Memory)
Jump to: navigation, search

Translation Memory

This section lets you select a TM or create a new one, define TM attributes, set TM rules, set up a Background TM, set up a remote TM with Wordfast Server or Wordfast Anywhere, and setup machine translation.

Terminology

Setup

Word/character count & billing

Wordfast's way of counting words is slightly different from Ms-Word's statistics (Tools/Wordcount or Tools/Statistics). For example, in the following text:

L'argent de Louis-Philippe

Ms-Word will find 3 words, while Wordfast will find 5 words (a very similar word count is upheld by most translation tools). On average, Wordfast will find from 5 to 10% more words than Ms-Word, depending on the language. The difference is more striking with French, more modest with other languages. This way of counting is in keeping with most translation syndicates and unions in most countries using alphabetic languages.

Discuss the word count issue with your client before starting working on a project.

On tagged documents, tags are counted as one word (regardless of their number of characters or words) and their number is also reported in the analysis final report. A tag is defined as any contiguous series of characters (spaces included) that have the tw4winInternal style.

Note that (as opposed to word count), tags are not included in the character count, because a tag is counted as one word; tags are included in the word count.

The Wordfast word/character count, as with all CAT tools, is based on what the tool considers to be translatable text. This can depend on the way you set up your tool. For example, the use of the "SegmentAll" command will force Wordfast to consider any text as translatable, including isolated fields, figures, etc. which would otherwise be left out of the translation process.

The Wordfast word/character count includes all headers and footers, footnotes, but not fields. Pay attention to word count when auditing a project, or producing an estimate. Ask yourself the question (if applicable) of whether the document(s) contains bookmarks, and if it does, what the author/client wants to do of them; whether graphics or textboxes should be translated, whether headers and footers should be translated, whether the word count is based on source, or target (translated), text, how to count tags, if any (per piece? per word? per character? at what rate?), etc.

Special care

This section deals with expert uses of Wordfast for tasks that require special attention. Wordfast does not guarantees operation because of its very nature. Wordfast is an add-on to a complex program (Ms-Word) that handles documents which, in the course of their lives, have been handled very differently by different people using different version sof Ms-Word (on PCs or Macs), through different formats (DOc, Rtf, Html, etc.) and sometimes very ill-conceived (many people use textboxes when tables, or a much simpler layout, should be used).

The special care section deals with tasks that are possible with Wordfast, but which need special attention in their execution, as well as a good knowledge of Ms-Word. Beginners should train themselves, or seek professional training, before engaging in projects outlined in the Special Care section. It is out of question for any translator to accept a "Special care" job wihout a prior understanding of the risks involved.

Fields and objects

An Ms-Word document can contain fields or objects like hypertext links, buttons, graphics etc. Normally, fields should not be translated (unless specifically required by your client, like index fields, for example), but copy-pasted into the translation. Note that the display options in Tools/Options/View can toggle the two views of fields: either the result of the field (a field is an instruction processed by Ms-Word, usually resulting in some displayed text - the result), or field codes, which look like {DATECREATION \* FUSIONFORMAT}. I recommend using the icon that toggles the two views (use the View/Toolbars/Customise menu, click the Commands tab, then View in the list, then drag-drop the {a} icon into the toolbar of your choice), or the Alt+F9 Ms-Word shortcut.

To graphically understand this concept, set your current View mode to "Normal" with the View menu in Ms-Word. Press Alt+F9 right now a few times to grasp the concept behind fields (this manual's table of contents is a TOC field), and the two ways to look at fields (result or code). The following "Today's date" field: 26/09/2017 should toggle between the two views. This manual's Table of Contents is actually a TOC field. If you were to translate this manual, you would not translate the Table of Contents, but merely update it by having the cursor anywhere in the Table of Contents and pressing Ms-Word's F9 shortcut once the entire manual has been translated and cleaned-up.

When fields are present in the source text and no proposition comes from the TM, you may consider using Wordfast's Copy source icon to copy the source segment into the target segment, and translate by overwriting it, leaving fields or objects unchanged. Otherwise, individual fields and objects should be carefully copy-pasted into the target segment's translation, at the appropriate location.

Dictionary

(PC only) Wordfast can be linked to virtually any external dictionary application, such as the Collins On-line, Harrap's Shorter, Merriam Webster's , Microsoft Encarta , any web-based dictionary or database, Trados Multiterm etc, using the Select dictionary button of the Terminology/Reference tab.

The access keystroke (Keys button) defines the keystrokes used for accessing an external dictionary, where some fields are replaced with values as in the following table:

Field will be replaced by Wordfast with Example
{SearchWord} the word you are searching for house
{SourceSegment} the text of the source segment (without tags, if any)
{TargetSegment} the text of the target segment (without tags, if any)
{SL-CD} The source language code with local variant EN-US
{SL} The source language code, in 2 characters EN
{TL-CD} The target language code with local variant FR-FR
{TL} 2-character target language code FR
{pause} Pauses the execution for 200 milliseconds
{PAUSE} Pauses the execution for 4 seconds
{pause=Harraps} Pauses until the application's window caption contains the string "Harraps". Case-insensitive. 10-second timeout.
{Ms-Word} Returns the focus to the Ms-Word application

To set up the "Keys" parameter, start your dictionary application, then note the sequence of keystrokes necessary to perform a word search. Once this is done, click the Keys button and enter the caption of the dictionary application window, followed by a semi-colon, followed by the keys you noted. For example,

Harraps;{pause}{F3}{Escape}%e{SearchWord}{Enter}

will instruct Wordfast to look for an application whose window name begins with Harraps, activate it, pause for 200 milliseconds, then type an F3 key, followed by an Escape Key, then Alt+E, then the searched-for word, then an Enter key. All typable keys are simply entered as they are, in lowercase. Function keys and other special keys are entered as follows:

A, B, C etc a, b, c etc F1 etc {F1} etc
Enter {Enter} End {End}
Escape {Escape} Tabulator {Tab}
Alt % Shift +
Ctrl ^ Up {Up}
Down {Down} PageUp {PgUp}
PageDown {PgDn] Home {Home}

Once the dictionary has been setup, close Wordfast. Position the cursor on a word, or select an expression, and click the Dictionary icon Dictionary icon.png (or press Ctrl+Alt+D. For the dictionary #2, use the Ctrl+Alt+F shortcut). Wordfast will launch the dictionary application (or activate the relevant window if the application is already running) and execute the sequence of keystrokes you defined.

Concordance search

The search for concordance will be done first in the background translation memory (if applicable), then in the regular translation memory. The purpose of Concordance search is to find Translation Units (TUs) that contain a given word or a set of words. The Ctrl+Alt+C shortcut or the Concordance icon Concordance icon.png launches the search. The search will bring results on words that begin like the searched-for item, case-insensitive. Searching for cat will bring TUs that contain cat, or catering or caterpillar, etc, but not bobcat or supercat. Searching for *cat will bring TUs that contain words like bobcat or supercat etc.

The AND operator can be used. Searching for cat+dog will bring TUs where the two words cat AND dog are found. If words are simply separated with spaces, the OR operator is assumed, so searching for cat dog will bring TUs where either cat OR dog are found. To search for an exact phrase, have it contained within straight quotes, so searching for "The cat chases the dog" will bring results where the phrase "The cat chases the dog" is literally found, regardless of case.

Note that to open the dialog box that lets you specify such extended search options, you must start concordance search when no selection is made; if a selection is made (for example, one word is selected in the source segment), then Wordfast assumes that the selected word has to be searched and will directly search for it, without offering the extended search dialog box. This allows fast searches with minimal clicks or shortcuts.

The same rules apply for Reference searches as well.

If you check the "Search concordances in all sibling translation memories" option in Wordfast/Terminology/Other, the concordance search will be extended to other TMs present in the same folder as the currently active TM.

It is possible to cancel a Concordance search with the Escape key, or with the same shortcut that started the search (i.e., Ctrl+Alt+C).

TM and glossary management

WFC segmentation rules

The largest possible unit of segmentation with Wordfast, as with most translation tools, is the paragraph. Paragraphs end with a paragraph mark (ANSI 13 with or without page feed ANSI 10), page feed (ANSI 12), end of cell (ANSI 7). Not that the manual line feed (ANSI 11) does not end a paragraph. Nevertheless, Wordfast can be set up to consider the manual line feed as ending a segment: see the section on customizing ESPs, or the note further below.

Wordfast attempts to recognize individual segments within a paragraph by parsing the paragraph and looking for End of Segment Punctuations (ESPs). The default ESPs used by Wordfast are . : ! ? as well as the tabulator mark, noted ^t by Wordfast, and the manual line feed, noted ^l . Users can edit the list of ESPs to fine-tune segmentation, although that is not recommended, as it breaks their TM compatibility with most other TMs.

If all ESPs are deleted, Wordfast segments at the whole paragraph level. This is not recommended, as some paragraphs may exceed the acceptable segment limit of 8,000 characters (nearly two large pages!) imposed by Wordfast, although segments of that size are very rare. If a segment is larger than 8,000 characters, Wordfast ignores the extra characters, which can be segmented with the "ForceSegment" shortcut.

To remain compatible with most other tools, Wordfast does not consider the manual line feed (noted ^l ) as ending a segment. Users can add ^l to the user-defined list of ESPs in Wordfast to break segments when a manual line feed (ANSI code 11, decimal) is encountered, which is generally considered more logical. However, by default, Wordfast does not end a segment at a manual line feed.

Within a paragraph, Wordfast will consider that it has reached the end of a segment if:

  1. the said segment ends with an ESP, AND
  2. a space is immediately after the ESP,AND
  3. the letter following that space is a capital letter, AND
  4. the character immediately before the ESP is not a number.

Rules 2, 3, 4 can be disabled by the user in the Wordfast > Setup > Segments pane. With CJK languages, rule 2 is always disabled, and the "wide-character" equivalent punctuations are also used.

the following sequence example produces
full stop, space, uppercase. Hello world. Hello world. 2 segments
full stop, space, lowercase. Hello world. hello world. 1 segment
full stop, space, number. Hello world. 10 Hello world. 1 segment
full stop, no space, upper/lowercase. Hello world.Hello world. 1 segment

Rule concerning the beginning of a segment If a segment begins with a series of numbers (or combination of numbers and full stops) followed by a full stop, Wordfast assumes that it's a numbering scheme, and skips the apparent numbering scheme. With the following text:

10. This is text

the segment will begin with "This is text", skipping the initial "10.". If the initial number is actually part of the segment, translators can press Alt+Delete (Unsegment), then select the entire sentence and press Shift+Alt+Down (ForceSegment). Translators can also set Wordfast to always override the number-skipping behaviour with the "SegmentAll" command in Pandora's Box.

Parts of text not considered as segments. Isolated series/combinations of numbers, spaces, punctuation do not consitute a segment. For example,

100

100.89.67.90

100 (9078) // 67-56

will be skipped by Wordfast as being "numbers". But

100a

100.89.67ö90

100 (9078) // 67-56 é

will all be segmented, because at least one letter is present in each series of numbers/punctuations. The "SegmentAll" command in Pandora's Box will force Wordfast to segment isolated series of numbers/spaces/punctuation at all times.

Abbreviations Users can specify a list of abbreviations in Wordfast > Setup > Segments. Wordfast will not end a segment if its last series of characters matches any of the abbreviations, case-sensitive. For example, if "Pr." is listed in the user-specified abbreviations, which is the case by default, the following sentence will be considered as making up a whole segment...

Here is Pr. Johnson.

... although "Pr." is followed by a full stop, a space, and a capital letter.

There are many translation-time shortcuts and options that let the translator fine-tune segments to expand them, shrink them, or force a selection of text to be considred a whole segment, regardless of rules. However, translators should remember to prefer default segmentation whenever possible, to remain compatible with other TMs.

Troubleshooting

Glossary of terms used in this manual

Appendix I - Understanding segmentation & TM

Appendix II - Language & spell check settings

A document can contain text written in different languages. In Ms-Word, the language is a text attribute, just as font, colour, etc. The Tools/Language menu is used to apply a certain language to a selection. This language setting is important, for example, when spell-checking. Usually, the client will send you a document where all the text has the source language (e.g. "English") as attribute. When translating, it is important that the target text receives the target language (e.g. "French") as attribute. This allows you to spell-check the target segments using the proper dictionary. This should be set up in Wordfast's Setup/Segments tab.

Wordfast will apply the specified target language (or default language, as specified in Wordfast/Setup/Segments) to the target segment. If, however, you have chosen the "leave unchanged" setting, Wordfast will not redefine the target language.

Appendix III - Macro samples

Appendix IV - Advanced Find/Replace

Note: the Wordfast Knowledge Base, accessible from http://www.wordfast.net has more contents on the following topic.

Ms-Word's Find/Replace feature (FR) accepts wildcards and advanced features. A good understanding of FR can save the day on numerous occasions. I had to oversee translation projects where, to my astonishment, translators were spending hours executing visual/manual Find-Replace actions that could have been safely executed automatically.

Sure, FR actions can be destructive if they're not executed properly, since they can modify unwanted parts of the document. On a short document, a visual/manual FR can be preferred, since setting up and testing a smart and safe FR can take a little while.

Note that PlusTools offers a FR feature that can be run over many files, both in manual and automatic mode, with the possibility to edit the document and restart the FR where it was interrupted.