Translating OCRed source files

New-OCR-test

When translating OCRed source files, make sure to choose the new filter for OCR files called ‘Ms Word OCR’, which will suppress nearly all unnecessary tags.

Example scans from ABBYY

Example Word documents

Example segments in CafeTran, using the regular MS Word DOCX filter:

01.png
02.png
03.png
04.png
05.png

Same segments in CafeTran, using the special MS Word OCR DOCX filter:

06.png
07.png
08.png
09.png
10.png

Some real-world examples:

Regular Word filter:

1.png
2.png
3.png

Special OCR filter:

1a.png
2a.png
3a.png

See also: Using Condense to OCR snippets

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License