There are two ways to store terms in CafeTran
- In Termbases which are TMX files
Use them if you want to have fuzzy term recognition
- In Glossaries which are plain-text files with TABS
Use them if you want to have compact files that are easy to manage
Glossaries versus Termbases
Watch a video about Glossaries versus Termbases
PLEASE NOTE: After importing the Excel file, you should always save the termbase under a new name via Memory > Save memory as …
Creating a Termbase
At the present time you create a Termbase from the Memory menu:
- In the Memory menu choose New memory.
The dialogue box New memory is displayed (Note that this dialogue box should rather be called New Termbase).
- Adjust the settings in the dialogue box to optimise for Termbases (actually I think that these should be default settings for Termbases):
Adding term pairs to your Termbase
Once you have created a termbase, you can start adding term pairs to it.
- Select one or more words in the Source segment pane.
- Select one or more words in the Target segment pane.
- Click on the icon Add fragment to memory:
The New fragment dialogue box will open.
- Make any necessary modifications and additions.
- Click the OK button to close the dialogue box and add a term pair to your termbase:
Manually overriding the auto-assembling result
Both in Termbases and Glossaries you can manually override the result of the auto-assembling process, that is: change the target term that CafeTran will insert automatically in the Target segment pane.
- Position the text cursor in the target term that you want to override, or, when the target term consists of several words (a so-called multi-word term), select all those words.
- Click the right mouse button to open the context menu.
- Use right mouse button to override auto-assembling result only once (in this segment).
- Use left mouse button to override auto-assembling result only once (in this segment).
Here the process is shown for the Termbase:
And just to be complete, here the process is shown for the Glossary:
Balancing your Termbases for auto-assembling
Let's have a look at the termbase that was created in the video:
Compare this with the structure of a glossary that uses source-side and target-side alternatives:
LA;Lüftungsanlage | VE;ventilatie-eenheid |
WLA;Wohnraumlüftungsanlage | residentiële ventilatie-eenheid;RVE |
NWLA;Nichtwohnraumlüftungsanlage | niet-residentiële ventilatie-eenheid;NRVE |
And answer for yourself the question:
Which resource for terminology is easier to manage with regards to balancing target terms for the auto-assembling process?
Browsing in Termbases
After loading a Termbase, the last entries will be displayed. You can browse in the Termbase via the context menu:
Using Termbases with truncation of terms
You can also use Termbases to let CafeTran ignore the ending of words and only match on the first part or stem of a word. This is sometimes called stemming, though truncation is probably a more appropriate name.
From the Wikipedia:
Stemming is the term used in linguistic morphology and information retrieval to describe the process for reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form.
Watch a video about Using stemming in CafeTran
The relevant setting is the Prefix matching (%) selector in the New memory (New termbase) dialogue box.
When you choose Fixed length only the first four letters will be used for term matching (term recognition) OR: the last four letters will be truncated. This value of 4 is set in Edit > Options > Memory > Minimal prefix length:
You can also set a percentage from 10 % to 90 %.
Here are 4 screenshots, made with Prefix matching 'Fixed length' active:
Note that all terms were recognised with Prefix matching set to 'Fixed length'. When the Prefix matching is set to '10%', all terms will be recognised too. When the Prefix matching is set to '80%', only the terms in the first 2 segments will be recognised.
PLEASE NOTE: You will have to experiment with the value for Prefix matching. This value will be depending on some characteristics of your source language and will likely vary per language.
Read an article written by a CafeTran user who only uses Termbases for his terminology