WARNING: Use Terminal commands only when you know what you are doing on your Mac or Linux PC.
Command Line Tools offer a great way to manipulate your tab-delimited glossaries.
Cutting out the first column
You can cut out the first column of your CafeTran glossary, e.g. to spell check it. Here is an example with the InterActive Terminology for Europe, the EU's multilingual termbase:
cut -f1 iate.txt > german.txt
See also: Spell-checking a glossary
Cutting out the second column
cut -f2 iate.txt > dutch.txt
Changing a text to lowercase
Now we change the Dutch column to all lowercase:
tr [:upper:] [:lower:] < dutch.txt > lowercase.txt
Joining columns
Now we recombine the two columns:
paste german.txt lowercase.txt > modified.txt
More info: C't 2014, Volume 22, pages 174-177
Creating a word list from a source text
Use this command in the Terminal to create a word list from a source text, while excluding all words in a given list:
fmt -1 source.txt | tr -s '\t' ' ' | sed -e 's/^[ ]//' | tr -d [:punct:] | tr -d [:digit:] | grep -w -i -v -fexclude.txt | sort | uniq > wordlist.txt
| Command | Explanation |
|---|---|
| fmt -1 source.txt | Reformat (word wrap) the source text so that all words are on separate lines. |
| tr -s '\t' ' ' | Replace all tab characters with spaces. |
| sed -e 's/^[ ]//' | Remove leading spaces. |
| tr -d [:punct:] | tr -d [:digit:] | Remove punctuation characters and digits. |
| grep -w -i -v -fexclude.txt | Ignore (case insensitive) all words in the list 'exclude.txt'. |
| sort | uniq | Sort the list and remove all duplicates. |
| > wordlist.txt | Send the output to the file 'wordlist.txt'. |
Download a list with German stop words.