Extract Text from Scanned PDF in JavaScript using PDF.co Web API
In this tutorial, we’ll guide you on how to extract text from scanned PDFs using Asynchronous Mode in JavaScript with the PDF.co Web API. Asynchronous processing allows your application to avoid timeouts and continue other operations while waiting for the conversion to complete, making it especially useful for processing large PDF files.
Benefits of Asynchronous Mode
- Avoids Timeouts: Large files or complex conversions may take time. Asynchronous mode prevents your application from being blocked or timing out during these processes.
- Optimized Workflow: Your application can continue executing other tasks while the server processes the PDF.
- Scalable: Asynchronous processing is ideal for applications handling multiple file conversions simultaneously.
For more details on asynchronous processing, refer to the PDF.co Asynchronous Processing Documentation.
Source File
Output File
Step 1: Source Code
Copy the JavaScript sample code provided here and open it in your favorite editor.
Step 2: Add Your PDF.co API Key
Add your API key in the appropriate section of the code (line 8). You can retrieve your API key from the PDF.co Dashboard.
Step 3: Provide the Scanned PDF Source URL
Specify the URL of the scanned PDF to be processed in line 11.
Step 4: Destination File
Define the output destination for the extracted text in line 17.
How It Works
- Initiating an Asynchronous Conversion:
- The code sends a request to the PDF.co API to start a PDF-to-TXT conversion with the
async
option set totrue
to enable asynchronous processing. - The API response contains a job ID and a URL to monitor the conversion status.
- The code sends a request to the PDF.co API to start a PDF-to-TXT conversion with the
- Checking Job Status:
- The code uses the job ID to periodically query the job's status by sending requests to the API.
- Once the status changes to "
success
", it indicates that the conversion process is complete, and the result file is ready for download.
- Downloading the Result:
- After the conversion is successfully completed, the code retrieves the converted TXT file from the provided URL and saves it to the specified location on the local system.
In this tutorial, you’ve learned how to extract text from scanned PDFs in Asynchronous Mode using PDF.co Web API. By leveraging this mode, your application remains responsive and efficient while processing large or complex files.