In this tutorial, we will show you how to extract text from scanned PDF in PHP using PDF.co Web API. Below is the image of the scanned PDF source file and its extracted text output.

  1. Source Files
  2. Start Server
  3. Run Program
  4. Demo
Sample Scanned PDF Input And Text Output
Sample Scanned PDF Input and Text Output

Step 1 – Source Files

First, kindly open the HTML and the PHP source code in your favorite editor. You can get the source code samples here

We highly recommend saving the files in a folder inside the \www or the \htdocs directory.

Files Inside Directory

Step 2 – Start Server

Next, let’s start the Apache server so we can run our program.

Step 3 – Run Program

Now, let’s run our program and extract text from the scanned PDF.

  • In the browser address bar, type in localhost/folder-name/sample.html. The /folder-name/ is a folder in the /www directory where you stored the files if you are using WampServer.
  • In the API Key field, enter your PDF.co API Key. You can get it in your PDF.co dashboard.
  • Choose the scanned PDF file.
  • Leave the page number field empty so it extracts all the PDF pages.

Then, click on the Proceed button.

Run Program To Extract Text From Scanned PDF

Step 4 – Demo

Here’s a demo to see PDF to Text Web API in action.

Extract Text From Scanned PDF Demo
Extract Text from Scanned PDF Demo

In this tutorial, you learned how to extract text from scanned PDF in PHP using PDF.co Web API. You also learned how to set up the source code samples to get you up and running right away.