How to Install The Latest Tesseract OCR 5 in Ubuntu 20.04 / 18.04 / 22.04

Last updated: June 3, 2023

This simple tutorial shows how to install the latest Tesseract OCR engine in all current Ubuntu releases via PPA.

Tesseract is the most accurate open-source OCR engine that reads a wide variety of image formats and converts them to text in over 40 languages. Tesseract 5.0.0 was officially released a few days ago that features:

  • Faster training and OCR performance while less memory usage via ‘fast bloats’.
  • Support for latest macOS and Apple Silicon
  • Better ARM/ARM64 support.
  • API improvements and more.

How to Install Tesseract OCR in Ubuntu:

The optical character recognition engine is available in Ubuntu repositories though it’s always old.

Thanks to Alexander Pozdnyakov, the maintainer of Tesseract OCR in Debian/Ubuntu official repository, also maintains few PPAs with the latest packages. And, most CPU architectures (amd64, i386, arm64/armhf, ppc64el, s390x) are supported.

Option 1: Add Tesseract 4.x PPA

For the latest release of Tesseract OCR 4 (v4.1.3 so far), the stable PPA maintains the packages for Ubuntu 18.04, Ubuntu 20.04, Ubuntu 21.10, and old Ubuntu 16.04/14.04.

Press Ctrl+Alt+T on keyboard to open terminal. When it opens, run the command below to add the PPA:

sudo add-apt-repository ppa:alex-p/tesseract-ocr

Type user password when it asks (no visual feedback) and hit Enter to continue.

Option 2: Add Tesseract 5 PPA

The new 5.x release series is available in the another PPA for Ubuntu 18.04, Ubuntu 20.04, and Ubuntu 22.04, 23.04.

Also, press Ctrl+Alt+T to open terminal and run command:

sudo add-apt-repository ppa:alex-p/tesseract-ocr5

NOTE: install the OCR from this PPA will override the old 4.x packages, though it’s not 100 % API compatible with v4.0.

Option 3: Add Tesseract repository for Debian:

For Debian Stretch, Buster, Bullseye, and Sid, there’s apt repositories for both Tesseract v4 and v5. Along with Ubuntu 21.10 users may follow the link button below to add the repository:

Update and Install Tesseract:

After adding a PPA or repository from the previous options, run command in terminal to refresh system package cache in case you’re still running old Ubuntu 18.04 and earlier:

sudo apt update

And, finally install the software engine via command:

sudo apt install tesseract-ocr

Or, upgrade the package using Software Updater:

How to Remove PPAs & uninstall Tesseract OCR:

To remove the PPAs, either run previous add-apt-repository command with --remove flag, or use Software & Updates utility under ‘Other Software’ tab.

To remove OCR engine, use command:

sudo apt remove --autoremove tesseract-ocr tesseract-ocr-*

You may also remove the libtesseract* package, which will however remove other app packages (e.g., gImageReader) that depends on it.

Twitter

I'm a freelance blogger who started using Ubuntu in 2007 and wishes to share my experiences and some useful tips with Ubuntu beginners and lovers. Please comment to let me know if the tutorial is outdated! And, notify me if you find any typo/grammar/language mistakes. English is not my native language. Contact me via ubuntuhandbook1@gmail.com Buy me a coffee: https://ko-fi.com/ubuntuhandbook1

3 responses to How to Install The Latest Tesseract OCR 5 in Ubuntu 20.04 / 18.04 / 22.04

  1. Ji, Do you use this? What API’s are out there to use with it?

  2. I already had Tesseract 4 installed on my Ubuntu 20.04 system. The install of 5 went a bit differently than it did on yours. It installed tesseract-ocr as an upgrade, and did not install tesseract-ocr-osd. I think it saw the 4.0 version as satisfying the dependency, but the files are in the wrong place for 5.0.

    When I first tried to run 5.0, it failed with:
    Error opening data file /usr/share/tesseract-ocr/5/tessdata/eng.traineddata

    I found the simple fix on stackoverflow, under the title”Tesseract running error”: you just need to download the eng.traineddata from github, and copy it to where it belongs.

    (I’m leaving out URLs because they freak out some blogging software’s spam filters).

    Thanks for doing this guide: I wasn’t even sure that 5.0 was released, and you let me know that it was and saved me the hassle of finding the PPA to upgrade.

    Ran

  3. Michael Bloom June 10, 2023 at 6:05 am

    When trying to add your tesseract 5 repository to a bionic system, I am getting the following error:

    Executing: /tmp/apt-key-gpghome.uMfjMffagF/gpg.1.sh –keyserver keyserver.ubuntu.com –recv-keys 8529B1E0F8BF7F65C12FABB0A4BCBD87CEF9E52D
    gpg: keyserver receive failed: Invalid argument