Preventing sentence segmentation at abbreviations
CafeTran has two different ways to prevent segmentation at abbreviations:
- By use of SRX files (to be used for more complex, generic rules–like date rules–and compatible with many other CAT tools), see: Segmentation rules
- The easy way: by means of a simple, human-readable plain text file with abbreviations (one per line):
Abbreviations lists
CafeTran auto-creates a simple abbreviations text file for a given source language.
On Windows systems it is located at:
\CafeTran\cafetran\resources\abbreviations\abbreviations-nl_NL.txt
On Macs you can find it at:
/Applications/CafeTran.app/Contents/Resources/Java/resources/abbreviations/abbreviations-de_DE.txt
These abbreviations lists can work independently or in tandem with the SRX segmentation rules.
Scanning for candidates for abbreviations
CafeTran can help you to identify abbreviations. This is especially useful for longer projects and/or team work.
- Create a CafeTran translation project.
- In the Tools menu select Abbreviations and Scan Project for abbreviations.
CafeTran will list abbreviation candidates in the tabbed pane.
- In the list of candidates click on any relevant abbreviation to add it to the Abbreviations file.
PLEASE NOTE: CafeTran will treat any one-, two- or three-letter string plus a period as a candidate for the list of abbreviations.
Using dynamic segmentation
During translating, new abbreviations can be added via Tools > Abbreviations > Add selection to abbreviations (Ctrl+Shift+B). CafeTran will not break the text at this abbr. in the future. You can add a single word as an abbreviation or multiple consecutive words. Always include the trailing period when adding words.
CafeTran auto-joins segments at any next occurrence of an abbreviation added via Tools > Abbreviations > Add selection to abbreviations (i.e., the whole project is not re-segmentated).
PLEASE NOTE: In contrast to single-word abbreviations, multi-word abbreviations won't trigger the new automatic segment joining mechanism and will only work in the segmentation phase of a new project.
Steps
- Load a CafeTran project.
- From the Tools menu select Abbreviations and then Scan Project for abbreviations.
Potential new items for the list of abbreviations for the source language of your translation project are listed. Only candidates with 1–3 letters are listed.
- Click on the candidates that you want to approve as an abbreviation and add to your personal list:
- From the Tools menu select Abbreviations and then Edit abbreviations to check your list.
- You can now also add multi-word abbreviations and any abbreviation that is longer than 3 characters:
Adding abbreviations on the fly
- Select the abbreviation that you want to add to the list, including the period.
- Open the context menu to add the abbreviation to the list:
- Confirm the abbreviation by clicking OK:
- In the next segment that contains the abbreviation that you just added to the list, no segmentation at the added abbreviation will take place:
Example files with abbreviations
Want to make a head start? Just add these language specific files to your abbreviations-xx_YY.txt file. Use the free Notepad++, TextPad or TextWrangler and make sure that you use UTF-8 encoding and Unix linefeeds. Copy the content of the files to the clipboard, open the abbreviations list in the tabbed pane and insert the additions. Close and save the abbreviations file.
- Dutch abbreviations (ACHTUNG! This file is causing problems and is currently being analysed by the Bug Team.)
- English please contact ku.rejieb|leahcim#ku.rejieb|leahcim
- German abbreviations
- French abbreviations
- Italian abbreviations
- Polish abbreviations
- Russian abbreviations
- Spanish abbreviations
See also: Wikipedia page about abbreviations