9.5 Tesseract Installation

This topic describes systematic instructions of Tesseract installation.

Prerequisite

Build Tools
Make sure that the following build tools are available:
  • GNU Autotools—autoconf, automake, libtool
  • CMake (Optional, we use CMake if autoconf fails to build leptonica).

    Both should be available inside Oracle yum.

Dependent Libraries

The libraries must be on the server. By default, they are available on Oracle Linux. If libraries are not present, please install through yum with the following command:

sudo yum install <LIBRARY_NAME>
Following are the library names:
  • libjpeg
  • ibtiff
  • zlib
  • libjpeg-turbo
  • libwebp
  • libpng-devel
  • libtiff-devel
  • libwebp-devel

Note:

If you are using any distribution other than Oracle Linux, please install libraries from the official Oracle repo or any other repo available for that distribution.
Installation Files

Download the installation files required to install and set up Tesseract. Files are available at <Unzip the file>/THIRD_PARTY_SOFTWARES/Tesseract.

Please find below the list of files present in the directory:
  • leptonica-1.84.1.tar.gz
  • tesseract-5.4.1.tar.gz
  • eng.traineddata
  • osd.traineddata

Leptonica Installation

Tesseract uses Leptonica internally for image processing. Leptonica can be built and installed by autoconf or CMake. The installation can be done using Autoconf and CMake.

Note:

If the user already have full access to all installation directory, then sudo is not required.

>sudo LINUX_COMMAND (In case the user does not have file access permissions)

>LINUX_COMMAND (In case the user has all access. Example: DBA user, Root user)

Note:

In this topic, we execute all commands with sudo. The user can skip based on your user permission details.
Installation through Autoconf
  • Copy the downloaded leptonica tarball (leptonica-1.84.1.tar.gz) in server (installation directory). For eample: /scratch.
  • Execute below commands sequentially to install leptonica through autoconf.

    Note:

    In line 4, we used sudo make –j4. Here 4 is the number of CPU core. Generally, the user can use sudo make –jn where n is the number of core. It will make the build process much faster.

    Here, the core number is used as 4 to build the software.

    If the processor does not have multiple cores, the user can use normal make command sudo make.

    sudo tar xvf leptonica-1.84.1.tar.gz
    cd leptonica-1.84.1
    sudo ./configure
    sudo make -j4
    sudo make install

Note:

If the installation is successful, then go to Leptonica Configuration . Else, go to Install through CMake.

Installation through CMake
  • If the installation through Autoconf fails to generate the configure file or has any other error, follow the commands below to build through CMake.
    sudo tar xvf leptonica-1.84.1.tar.gz
    cd leptonica-1.84.1
    sudo mkdir build
    cd build
    sudo cmake ..
    sudo make -j4
    sudo make install
Leptonica Configuration
  • Configure the Leptonica path so that the Tesseract Leptonica installation can be found.
  • Add the leptonica installation directory in library path. Example: /usr/local/lib ,/usr/lib, /usr/lib64 etc.
  • Configure the Leptonica header path. Example: /usr/local/include/leptonica.
  • Setup the Pkgconfig path and execute the below mentioned commands to set the path.
    export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig/
    export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/lib64/pkgconfig/
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
    export LIBLEPT HEADERSDIR=/usr/local/include/leptonica

    Note:

    Sometimes, tesseract still unable to find lept.pc file.
    It gives configuration errors. For example, Leptonica 1.74 or higher is required. In that case, locate the lept.pc file (usually present at /usr/local/lib/pkgconfig/) with the command locate lept.pc and copy the same in /usr/lib64 directory.
    sudo cp /usr/local/lib/pkgconfig/lept.pc /usr/lib64/pkgconfig/
    Similarly, some services might not be able to get Libleptonica shared object files (.so files, ex: liblept.so, libleptonica.so etc.).

    Note:

    .so files are usually present in the server at /usr/local/lib.
  • Type whereis libleptonica or locate libleptonica to find the path and copy the .so files in /usr/lib64 path.
    cd /usr/local/lib
    sudo cp -a *liblept* /usr/lib64
Tesseract Installation
  • Copy the Tesseract tarball tesseract-5.4.1.tar.gz to the server (installation directory). For example, /scratch.
  • Copy the Tesseract trained files eng.traineddata, osd.traineddata to the server.
  • Execute below commands sequentially to build and install Tesseract.

    Note:

    /usr/bin is the directory where tesseract binary will be present if you pass prefix=/usr in configure. You can provide the path based on where you want to install.
    sudo tar xvf tesseract 5.4.1.tar.gz
    cd tesseract-5.4.1
    sudo ./autogen.sh
    sudo ./configure --prefix=/usr
    sudo make -j4
    sudo make install
  • Copy the traineddata files in tessdata directory.

    If you use prefix=/usr, tessdata directory is present at /usr/share. If you use prefix=/usr/local, tessdata directory is present at /usr/local/share.

    sudo cp osd.traineddata /usr/share/tessdata
    sudo cp eng.traineddata /usr/share/tessdata
Tesseract Configuration
  • Execute the below commands to set the Tesseract library path.
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
  • Sometimes services are unable to find libtesseract shared object files (.so files) present in system (Usually at /usr/lib). In that case copy the libtesseract files in /usr/lib64.
    cd /usr/lib
    sudo cp -a *libtesseract* /usr/lib64
  • Some programs search for the tessdata directory in a different path (usr/share/tesseract/4/tessdata). Copy the existing tessdata directory to the path (either in /usr/share or /usr/local/share based on your installation).
    cd /usr/share
    sudo mkdir tesseract(execute if tesseract directory is not p resent)
    cd tesseract
    sudo mkdir 4
    cd /usr/share
    sudo cpR tessdata /usr/share/tesseract/4
  • Run the below command to set tessdata prefix.
    export TESSDATA_PREFIX=/usr/share/tesseract/4/tessdata

    The Tesseract is now installed.

  • Verify the version with below command.
    tesseract --version

    It shows the tesseract version (5.4.1), leptonica version (1.84.1) along with other default libraries (libjpeg, libjpeg-turbo, libpng, libtiff, zlib).