Difference between revisions of "Glossaries Wordfast Classic"

From Wordfast Wiki
Jump to: navigation, search
(Select/deselect glossaries)
(No difference)

Revision as of 13:11, 2 October 2017

Getting started

  • In Ms-Word, create a new document. In this new document, type a short series of source terms followed by a tabulator (press the tabulator key), followed by their translation, then Enter, as in the following example:


work travailler

country pays

country of residence pays de résidence


  • Name and Save the new document as "Text-only" (preferably Unicode or Encoded Text) using the File/Save as... menu. Congratulations, you have created a WFC glossary. Close the glossary document.
  • In WFC, go to the dialog box shown above (Terminology/Glossary X). Click the "Select glossary" button, find and open the glossary you just created (in the "File type" list, select "Text", or "All files").
  • Click the Reorganise button. This will make WFC sort the glossary on source terms, and index all entries.
  • Make sure the "This glossary is active" checkbox is checked, so WFC performs terminology recognition using this glossary during translation sessions. If you uncheck this checkbox, terminology recognition is suspended.
  • Close WFC.


In a new document with some text that includes any of the source terms listed above (like "work", "country" etc), start a translation session. Normally, these terms should be highlighted in light blue when a source segment includes them. This means that WFC has recognised that these terms are present in the glossary #1. You can select blue-highlighted terms with the Crl+Alt+left/right shortcuts and see their translation in the status bar, or copy their translation at insertion point in the target segment with Ctrl+Alt+down. If you place the cursor on a blue-highlighted term and press Ctrl+Alt+G, the glossary drop-down list will open and show the glossary entry. This same toolbar also enables you to open the glossary editor window.


The "Lock Case" checkbox forces a glossary-wide case lock on target terms. See further below for an explanation of "Lock case", a feature which also exists at the level of each individual glossary entry, which is the recommended option.

Adding terminology

On a document, during a translation or at any time, select source term, press Ctrl+Alt+T; select a target term, Ctrl+Alt+T again. This will display the dialog box below to finalize the pair of term you want to add to any glossary:


Lock case

The Lock Case checkbox is unchecked by default. When checked, it locks the target term's case.

In linguistics and dictionaries, the default appearance of terms is lower-case, except when case is a defining part of the term as in proper names, acronyms, etc. With WFC, when you place terminology in the target segment, using the AutoSuggest feature or the Ctrl+Alt+Left/Right/Down set of shortcuts, WFC tries to replicate the source term's case to save time.

Suppose your glossary contains a pair of source-target terms such as "summary" -> "résumé". You may have to translate various segments like:


Please send us a summary of your work.
Summary: see page 22.
3. SUMMARY


If the lock case checkbox is unchecked for that term, Wordfast will automatically adapt "résumé" to the source case, so it will correctly place "résumé" or "Résumé" or "RÉSUMÉ" when you need it, in the translation of those four segments.

There are numerous exceptions. Working to or from German, for example. Or having terms whose case usually differs between two languages. Month and day names in English have an upper-case first letter (January, February...) while that is generally not the case in French (janvier, février - unless they start a sentence). This is when lock case comes handy for a glossary entry.

You can check, then right-click the lock case checkbox in the glossary entry dialog box to make it the default setting when adding terminology. That is recommended only if most of your terminology (like working into German) has a different case.

You can also enforce a glossary-wide, unconditional "lock case" by checking that same checkbox in the Terminology > Glossary setup. That glossary-level setting supersedes the entry-level setting.


Fields

The Terminology addition dialog box has three Fields that are made to receive codes or special mentions that do not belong in "Source entry", "Target entry", or "Comment" fields.

Many translators add their own codes to glossary entries so they can later sort glossaries and extract selected terms.

For example, if you work on a project for a certain client, you may wish to add a client code to each glossary entry you create for this client, so that later you may distinguish them from other entries.

Since entering these codes is usually a repetitive task, two automatic features are added here:


- Enter text in a "field" area. Right-click the "Field" caption right before the textbox. WFC will ask you whether you want to have this text be entered by default every time you will enter new terms and see this dialog box pop up again .

- Enter special codes that will automatically be replaced with certain values when you validate the new entry. These codes are:


There are many other ways to add terminology. One way is to open the glossary with Ms-Excel, then type, or even copy-paste, rows and columns of data. Do not forget to close the glossary in Ms-Excel before using Ms-Word and WFC, because Ms-Excel keeps the glossary locked at all times when it is opened.

The Data Editor is also a way to manage and review glossaries.


Glossary format

A WFC glossary is a tab-delimited, text-only file containing 2 or 3 columns (source term, target term, optional comment). Additional columns can be present. Unicode text is accepted. "Columns" in a tab-delimited text-only file are items separated by tabulators. If opened with Excel, the items in such a tab-delimited TXT file will be neatly distributed into columns. If opened with Ms-Word, you would need to select the text and use the Table/Convert text to table menu to actually see items in a table format, with visible columns (but, before saving the text document, you would need to convert the table back to tab-delimited text).

Glossaries can be created or edited using Microsoft Excel. The first column (column A) should contain source terms, the second column (column B) should contain target terms, the third column should contain comments, if any. The Excel spreadsheet thus created should be saved as "Tab-delimited text" using Excel's File/Save as... menu.


Format when saving:

If the glossary is a Ms-Word table, immediately before saving it, select the entire table (with the Table/Select table menu), use the Table/Convert to text menu and convert the table to text, with the tabulator set as delimiter. Save your document as Text-only, or Unicode text if needed.

If the glossary is an Excel spreadsheet, save it as Tab-delimited text with Excel's File/Save as... menu. The Tab-delimited Text format is selected in the "File type" drop-down option list.


Terminology format

Terms can use upper and/or lower case. Avoid unnecessary characters like brackets, quotes, slashes, dashes, etc. unless absolutely necessary. The * wildcard can be used at the end of a term, if different endings of a term are possible (this is called MFTR and is described below). Here is a sample English-French glossary:


Maintenance*

Entretien*

Interview*

Entrevue*

minimum wage*

salaire* minim*


Do not place the * wildcard less than four characters from the beginning of an entry. So, pa* the bill* is not valid; use three entries like pay the bill*, pays the bill* and paid the bill*.

During a translation session, press Shift+Ctrl+G to load glossaries into a toolbar drop-down list for better visibility. Outside sessions, use Ctrl+Alt+Left/Right to display/hide the glossary lists. Note that glossaries of more than 5,000 entries, or more than 200 Kbytes, cannot be loaded into a toolbar drop-down list. But when looking up terms, WFC will load the term, plus 50 terms before and after the found term, for reference. These large glossaries can nevertheless be used for all other operations: QA, terminology recognition, etc. They are fully opened and editable using the glossary editor (the icon after the glossary drop-down list).



Multiple glossary entries

WFC accepts multiple glossary entries as follows:


avocat

attorney

avocat

barrister

avocat

lawyer

avocat

avocado

etc.


Add {preferred} to either the Comment field, or any of the three Note fields to show WFC which entry is preferred when propagation is used.



Fuzzy Terminology Recognition (FTR)

FTR in WFC can be automatic (AFTR), or manual (MFTR).


MFTR is done by manually adding asterisk wildcards (*) at the end of words in the glossary so that most, inflections of the glossary entry will be recognized in the document. For example, a glossary source entry like


Digital Analog* Converter*

(red colour added for emphasis)


will allow WFC to recognize, in the document, various approaching forms such as


Digital Analog Converters

Digital Analogic Converter

etc


if they are found in the source segment.


The asterisk wildcard (*) can also be placed at the beginning of a source glossary entry in case the beginning of the word is what changes with inflections. For example, if the following entry is in the glossary:


  • Работа*


it will match Поработаeт, Поработала, etc. in the document.


The asterisk can also be placed inside a word. For example, if the following entry is in the glossary:


Methyl*one


it will match methylisothiazolinone, methylprednisolone, etc, in the document, but it will not match methylisoline


The pipe (|) can be placed inside a term, and is equivalent to an ending asterisk: anything after the pipe will be ignored.


The question mark (?) can replace any single character:


Methyl?one


will match Methyleone or/and Methylhone, but not Methylheone


The sharp sign (#) can replace figures:


$#-fine


will match $200,000-fine or/and $200-fine.


If more than one asterisk is placed in a glossary source entry, two asterisks should be separated with at least four letters, otherwise, terminology recognition can become unreliable and slow. Entries like if*le* will not yield reliable results. In any case, the primary purpose of glossaries is to recognize technical jargon that a translator is not comfortable with; attempts to use glossaries as some sort of machine translation is a loss of time and may lead to frustration. Do not overload glossaries with common source language terms.


AFTR is useful on raw glossaries, where the translator has no time to manually place asterisks as explained above. WFC uses various techniques that attempt to automatically make up for the possible inflections of terms found in the document's source text.


Note that glossaries can be hybrid: they can contain both AFTR (raw) and MFTR (asterisked) entries. If any entry has an asterisk, WFC will not attempt AFTR on that entry, but make use of the asterisk. If two entries match the same queried term, the MFTR entry will be chosen rather than the match brought up by AFTR. However, if an un-asterisked glossary entry perfectly matches a queried term (no AFTR neither MFTR needed) then of course this entry will prevail over all others.


WFC can use more than one glossary. This enables you to simultaneously use both client terminology and your own, homegrown terminology, in two distinct glossaries. You can even set color schemes to immediately spot from which glossary a term has been recognised.


Client terminology is usually rushed together with the job, and in some cases, it can even be rushed after the job started, by overworked project managers. Manually fuzzying-up a glossary takes time and is best done between jobs, on spare time, this is why AFTR is acceptable for rushed client terminology, in the heat of a live project.


AFTR attempts to recognize most inflections. AFTR is by nature an imprecise (fuzzy) process, and may bring up occasional mismatches, which should simply be ignored, or, if time permits, lead to manual fixing/fuzzying (MFTR) in the glossary. Here are a few observations:


AFTR fares poorly on single, short words.

The longer a word, the better AFTR stands a chance to correctly recognize it.

AFTR is better on expressions of 2 or more words.

AFTR may be defeated by large glossaries that have many terms, especially made of single, short words, that look similar.


The conclusion is that AFTR should not be attempted on large glossaries with many similar entries. And in no case can AFTR be used for "autoassembly" schemes, or be a substitute for machine translation.


Typical client-supplied terminology looks like this (target terms omitted):


two-way multiplexed autoresponder

double furnace boiler

dichotomic search

DOS-based application

etc.


This is where AFTR really helps, and yields best results. Once the job is completed, and you have a spare hour, you may consider integrating client terminology into one of your existing glossaries, and manually add asterisks as follows:


two-way multiplexed autoresponder*

double furnace boiler*

dichotomic search*

DOS-based application*


This way, your homegrown glossary runs on MFTR rather than AFTR.


The essence of AFTR is to determine what is a word's stem by gradually stripping letters from the word's end. Note that we deal here with statistics - there are exceptions to this rule, and every language has its requirements. The verb go, for example, will change into went in the past tense, thereby defeating any AFTR attempt. By chance, client terminology is primarily made of technical words and expressions, where nouns outnumber verbs. And technical jargon (some of which is imported) is a less prone to wild variations than common, classic terms. Glossaries are primarily used for jargon, and more precisely, client jargon: the translator is supposed to understand common language.


Stripping is done gradually, by increments of one trailing letter, to maximum of four letters. A word like applications (found in the source segment) will first be reduced to application, then to applicatio, then applicati etc. Obviously, the first attempt (producing application) would hit a match in the glossary, provided the glossary has an entry for applicatio*.



How to load a glossary

Three glossaries can be selected in the WFC/Terminology/Glossary tabs. Click the "Select glossary" button to find and specify the glossary you want to use (WFC glossaries ahve a TXT extension). Then click the "Reorganise" button to have the glossary sorted and indexed by WFC. You can view/edit the glossary with Ms-Word.



Using glossaries for QA

Check the appropriate option in the QA pane in WFC > Setup. From then on, during a translation session, when the translator validates a translation, WFC will look for each source term in the source segment. If a source term is found in the source segment, WFC will expect to find the corresponding target term in the target segment. If it fails to do so, it will warn the user, giving a choice of editing the translation or ignoring the warning.



Quick search

Put the cursor on a word then use Ctrl+Alt+G to search a term in all three glossaries during, or outside, a translation session (if it's an expression, select the entire expression before pressing Ctrl+Alt+G). The glossaries will be loaded in the toolbar drop-down list if their size is less than 200 Kbytes. If their size is larger, WFC will load the nearest 100 entries before/after the found item.


Select/deselect glossaries

Use the "Select glossary" button to select a glossary.

If you want to keep a glossary selected, but don't want this glossary to be active, i.e., if you do not want WFC to perform terminology recognition on this glossary, uncheck the "This glossary is active" checkbox. Otherwise, keep this checkbox checked. This checkbox is automatically checked each time you use the "Select glossary" button.

For propagation to occur, the corresponding "Propagate" command must be activated in Pandora's box.

For the Quality Assurance terminology warnings to occur, the "Use for QA verification" checkbox must be checked.

Back to Wordfast Classic User Manual