9.5 Tesseract Installation
This topic describes systematic instructions of Tesseract installation.
Prerequisite
- GNU Autotools—autoconf, automake, libtool
- CMake (Optional, we use CMake if autoconf fails to build leptonica).
Both should be available inside Oracle yum.
The libraries must be on the server. By default, they are available on Oracle Linux. If libraries are not present, please install through yum with the following command:
sudo yum install <LIBRARY_NAME>- libjpeg
- ibtiff
- zlib
- libjpeg-turbo
- libwebp
- libpng-devel
- libtiff-devel
- libwebp-devel
Note:
If you are using any distribution other than Oracle Linux, please install libraries from the official Oracle repo or any other repo available for that distribution.Download the installation files
required to install and set up Tesseract. Files are available at
<Unzip the
file>/THIRD_PARTY_SOFTWARES/Tesseract.
- leptonica-1.84.1.tar.gz
- tesseract-5.4.1.tar.gz
- eng.traineddata
- osd.traineddata
Leptonica Installation
Tesseract uses Leptonica internally for image processing. Leptonica can be built and installed by autoconf or CMake. The installation can be done using Autoconf and CMake.
Note:
If the user already have full access to all installation directory, then sudo is not required.>sudo LINUX_COMMAND (In case the user does not have file access permissions)
>LINUX_COMMAND (In case the user has all access. Example: DBA user, Root user)
Note:
In this topic, we execute all commands with sudo. The user can skip based on your user permission details.- Copy the downloaded leptonica tarball (leptonica-1.84.1.tar.gz) in server (installation directory). For eample: /scratch.
- Execute below commands sequentially to install
leptonica through autoconf.
Note:
In line 4, we used sudo make –j4. Here 4 is the number of CPU core. Generally, the user can use sudo make –jn where n is the number of core. It will make the build process much faster.Here, the core number is used as 4 to build the software.
If the processor does not have multiple cores, the user can use normal make command
sudo make.sudo tar xvf leptonica-1.84.1.tar.gz cd leptonica-1.84.1 sudo ./configure sudo make -j4 sudo make install
Note:
If the installation is successful, then go to Leptonica Configuration . Else, go to Install through CMake.
- If the installation through Autoconf fails to
generate the configure file or has any other error, follow
the commands below to build through
CMake.
sudo tar xvf leptonica-1.84.1.tar.gz cd leptonica-1.84.1 sudo mkdir build cd build sudo cmake .. sudo make -j4 sudo make install
- Configure the Leptonica path so that the Tesseract Leptonica installation can be found.
- Add the leptonica installation directory in
library path. Example:
/usr/local/lib ,/usr/lib, /usr/lib64etc. - Configure the Leptonica header path.
Example:
/usr/local/include/leptonica. - Setup the Pkgconfig path and execute the below
mentioned commands to set the
path.
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig/ export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/lib64/pkgconfig/ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib export LIBLEPT HEADERSDIR=/usr/local/include/leptonicaNote:
Sometimes, tesseract still unable to find lept.pc file.It gives configuration errors. For example, Leptonica 1.74 or higher is required. In that case, locate the lept.pc file (usually present at /usr/local/lib/pkgconfig/) with the command locate lept.pc and copy the same in /usr/lib64 directory.sudo cp /usr/local/lib/pkgconfig/lept.pc /usr/lib64/pkgconfig/Similarly, some services might not be able to get Libleptonica shared object files (.so files, ex: liblept.so, libleptonica.so etc.).Note:
.so files are usually present in the server at /usr/local/lib. - Type whereis libleptonica or locate
libleptonica to find the path and copy the .so
files in /usr/lib64
path.
cd /usr/local/lib sudo cp -a *liblept* /usr/lib64
- Copy the Tesseract tarball tesseract-5.4.1.tar.gz
to the server (installation directory). For example,
/scratch. - Copy the Tesseract trained files eng.traineddata, osd.traineddata to the server.
- Execute below commands sequentially to build and
install Tesseract.
Note:
/usr/binis the directory where tesseract binary will be present if you passprefix=/usrin configure. You can provide the path based on where you want to install.sudo tar xvf tesseract 5.4.1.tar.gz cd tesseract-5.4.1 sudo ./autogen.sh sudo ./configure --prefix=/usr sudo make -j4 sudo make install - Copy the traineddata files in tessdata
directory.
If you use
prefix=/usr, tessdata directory is present at/usr/share. If you useprefix=/usr/local, tessdata directory is present at/usr/local/share.sudo cp osd.traineddata /usr/share/tessdata sudo cp eng.traineddata /usr/share/tessdata
- Execute the below commands to set the Tesseract
library
path.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib - Sometimes services are unable to find
libtesseractshared object files (.so files) present in system (Usually at/usr/lib). In that case copy thelibtesseractfiles in/usr/lib64.cd /usr/lib sudo cp -a *libtesseract* /usr/lib64 - Some programs search for the tessdata directory
in a different path
(
usr/share/tesseract/4/tessdata). Copy the existing tessdata directory to the path (either in/usr/share or /usr/local/sharebased on your installation).cd /usr/share sudo mkdir tesseract(execute if tesseract directory is not p resent) cd tesseract sudo mkdir 4 cd /usr/share sudo cpR tessdata /usr/share/tesseract/4 - Run the below command to set tessdata
prefix.
export TESSDATA_PREFIX=/usr/share/tesseract/4/tessdataThe Tesseract is now installed.
- Verify the version with below
command.
tesseract --versionIt shows the tesseract version (5.4.1), leptonica version (1.84.1) along with other default libraries (libjpeg, libjpeg-turbo, libpng, libtiff, zlib).