OCR in PHP: Read Text from Images with Tesseract

Optical Character Recognition (OCR) is the process of converting printed text into a digital representation. It has all sorts of practical applications — from digitizing printed books, creating electronic records of receipts, to number-plate recognition and even circumventing image-based CAPTCHAs.

Robotic eye

Tesseract is an open source program for performing OCR. You can run it on *Nix systems, Mac OSX and Windows, but using a library we can utilize it in PHP applications. This tutorial is designed to show you how.

Installation

Preparation

To keep things simple and consistent, we’ll use a Virtual Machine to run the application, which we’ll provision using Vagrant. This will take care of installing PHP and Nginx, though we’ll install Tesseract separately to demonstrate the process.

If you want to install Tesseract on your own, existing Debian-based system you can skip this next part — or alternatively visit the README for installation instructions on other *nix systems, Mac OSX (hint — use MacPorts!) or Windows.

Vagrant Setup

To set up Vagrant so that you can follow along with the tutorial, complete the following steps. Alternatively, you can simply grab the code from Github.

Enter the following command to download the Homestead Improved Vagrant configuration to a directory named ocr:

git clone https://github.com/Swader/homestead_improved ocr

We’re not going to be using Laravel, so change the Nginx configuration in Homestead.yml from:

sites:
    - map: homestead.app
      to: /home/vagrant/Code/Laravel/public

…to…

sites:
    - map: homestead.app
      to: /home/vagrant/Code/public

You’ll also need to add the following to your hosts file:

192.168.10.10       homestead.app

Installing the Tesseract Binary

The next step is to install the Tesseract binary.

Because Homestead Improved uses a Debian-based distribution of Linux, we can use apt-get to install it after logging into the VM with vagrant ssh. It’s as simple as running the following command:

sudo apt-get install tesseract-ocr

As I mentioned above, there are instructions for other operating systems in the README.

Continue reading %OCR in PHP: Read Text from Images with Tesseract%

Source: SitePoint

Pyntax

OCR in PHP: Read Text from Images with Tesseract

ByLukas White

Installation

Preparation

Vagrant Setup

Installing the Tesseract Binary

By Lukas White

Related Post

Drupal 8 Queue API – Powerful Manual and Cron Queueing

Easy Continuous Delivery with ContinuousPHP and Zend Server

Watch: Fetch Remote Data Using Guzzle

You missed

Teslas made in Texas will likely have to leave the state before Texans can buy them

MagSafe used to fish out iPhone 12 Pro dropped in canal

Wacom Cintiq Pro 24 Touch review: Beautiful but needs improvement

Google made it hard for users to keep location data private

Pyntax