OCR Languages¶

RustyLens uses Tesseract for OCR and supports all Tesseract language packs — over 100 languages in total.

How language detection works¶

At startup, RustyLens scans the tessdata directory (typically /usr/share/tessdata/) for .traineddata files and populates the language dropdown automatically. Any language pack you install is picked up on the next launch.

Installing language packs¶

Via `install.sh`¶

The easiest way to add languages:

./install.sh --european        # Western + Eastern European
./install.sh --asian           # CJK, Korean, Vietnamese, Thai
./install.sh --langs "fra jpn" # Custom selection

See install.sh for the full list of language group flags.

Via your package manager¶

ArchDebian / UbuntuFedora

sudo pacman -S tesseract-data-fra   # French
sudo pacman -S tesseract-data-jpn   # Japanese

sudo apt install tesseract-ocr-fra
sudo apt install tesseract-ocr-jpn

sudo dnf install tesseract-langpack-fra
sudo dnf install tesseract-langpack-jpn

Direct tessdata download¶

If your package manager does not carry a specific language, download it directly from the tessdata_fast repository:

sudo curl -fsSL \
  https://github.com/tesseract-ocr/tessdata_fast/raw/main/fra.traineddata \
  -o /usr/share/tessdata/fra.traineddata

Or use --download with install.sh to force direct download for all selected languages:

./install.sh --langs "fra jpn" --download

Common language codes¶

Code	Language
`eng`	English
`deu`	German
`fra`	French
`spa`	Spanish
`ita`	Italian
`por`	Portuguese
`rus`	Russian
`jpn`	Japanese
`chi_sim`	Chinese (Simplified)
`chi_tra`	Chinese (Traditional)
`kor`	Korean
`ara`	Arabic
`hin`	Hindi
`ben`	Bengali
`vie`	Vietnamese
`tha`	Thai

For the complete list of codes, see the Tesseract data files documentation.

Using multiple languages¶

The Auto (all) option in the dropdown runs OCR using every installed language simultaneously. This is helpful when you are unsure of the image's language, but increases processing time and may reduce per-language accuracy.

For best results, select the specific language that matches your image content.