Extract Text from Scanned PDF in JavaScript using PDF.co Web API

In this tutorial, we’ll guide you on how to extract text from scanned PDFs using Asynchronous Mode in JavaScript with the PDF.co Web API. Asynchronous processing allows your application to avoid timeouts and continue other operations while waiting for the conversion to complete, making it especially useful for processing large PDF files.

Benefits of Asynchronous Mode

  1. Avoids Timeouts: Large files or complex conversions may take time. Asynchronous mode prevents your application from being blocked or timing out during these processes.
  2. Optimized Workflow: Your application can continue executing other tasks while the server processes the PDF.
  3. Scalable: Asynchronous processing is ideal for applications handling multiple file conversions simultaneously.

For more details on asynchronous processing, refer to the PDF.co Asynchronous Processing Documentation.

Source File

Source

Output File

Output

Step 1: Source Code

Copy the JavaScript sample code provided here and open it in your favorite editor.

Step 2: Add Your PDF.co API Key

Add your API key in the appropriate section of the code (line 8). You can retrieve your API key from the PDF.co Dashboard.

Add Your PDF.co API Key

Step 3: Provide the Scanned PDF Source URL

Specify the URL of the scanned PDF to be processed in line 11.

Provide the Scanned PDF Source URL

Step 4: Destination File

Define the output destination for the extracted text in line 17.

Destination File

How It Works

  1. Initiating an Asynchronous Conversion:
    • The code sends a request to the PDF.co API to start a PDF-to-TXT conversion with the async option set to true to enable asynchronous processing.
    • The API response contains a job ID and a URL to monitor the conversion status.
  2. Checking Job Status:
    • The code uses the job ID to periodically query the job's status by sending requests to the API.
    • Once the status changes to "success", it indicates that the conversion process is complete, and the result file is ready for download.
  3. Downloading the Result:
    • After the conversion is successfully completed, the code retrieves the converted TXT file from the provided URL and saves it to the specified location on the local system.

In this tutorial, you’ve learned how to extract text from scanned PDFs in Asynchronous Mode using PDF.co Web API. By leveraging this mode, your application remains responsive and efficient while processing large or complex files.

Related Tutorials

See Related Tutorials