Split PDF by Text in JavaScript using PDF.co Web API

In this tutorial, we will show you how to split a PDF by text from a URL in JavaScript using PDF.co Web API, leveraging asynchronous processing.This approach is particularly beneficial for processing large PDF files.

Benefits of Asynchronous Mode

  1. Avoids Timeouts: Large files or complex conversions may take time. Asynchronous mode prevents your application from being blocked or timing out during these processes.
  2. Optimized Workflow: Your application can continue executing other tasks while the server processes the PDF.
  3. Scalable: Asynchronous processing is ideal for applications handling multiple file conversions simultaneously.

For more details on asynchronous processing, refer to the PDF.co Asynchronous Processing Documentation.

Source File

Below is the image of the 3-page PDF input and the three individual pages of PDF output files.

Source File

Output Files

Output Files

Step-by-Step Guide

Source Code

Open your text editor and create a JavaScript file with the following code:

const https = require("https");
const fs = require("fs");
const axios = require("axios");

// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
const API_KEY = "********************";

// Source PDF file to split
const SourceFileUrl = "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-split/multiple-invoices.pdf";
// Split Search String
const SplitText = "invoice number";

// Prepare request to `Split PDF By Text` API endpoint
const queryPath = "https://api.pdf.co/v1/pdf/split2";

// JSON payload for API request
const jsonPayload = {
    searchString: SplitText,
    url: SourceFileUrl,
    async: true
};

// Send POST request using axios
axios.post(queryPath, jsonPayload, {
    headers: {
        "x-api-key": API_KEY,
        "Content-Type": "application/json"
    }
})
    .then(response => {
        const data = response.data;
        if (!data.error) {
            console.log(`Job #${data.jobId} has been created!`);
            checkIfJobIsCompleted(data.jobId, data.url);
        } else {
            // Service reported an error
            console.log(data.message);
        }
    })
    .catch(error => {
        console.error("Error during API request:", error.message);
    });

function checkIfJobIsCompleted(jobId, resultFileUrlJson) {
    const queryPath = "https://api.pdf.co/v1/job/check";

    // JSON payload for API request
    const jsonPayload = {
        jobid: jobId
    };

    // Send POST request using axios
    axios.post(queryPath, jsonPayload, {
        headers: {
            "x-api-key": API_KEY,
            "Content-Type": "application/json"
        }
    })
        .then(response => {
            const data = response.data;
            console.log(`Checking Job #${jobId}, Status: ${data.status}, Time: ${new Date().toLocaleString()}`);

            if (data.status === "working") {
                // Check again after 3 seconds
                setTimeout(() => {
                    checkIfJobIsCompleted(jobId, resultFileUrlJson);
                }, 3000);
            } else if (data.status === "success") {
                // Fetch the result files
                axios.get(resultFileUrlJson)
                    .then(res => {
                        const respJsonFileArray = res.data;
                        let part = 1;

                        respJsonFileArray.forEach((url) => {
                            const localFileName = `./part${part}.pdf`;
                            const file = fs.createWriteStream(localFileName);
                            https.get(url, (response2) => {
                                response2.pipe(file).on("close", () => {
                                    console.log(`Generated PDF file saved as "${localFileName} file."`);
                                });
                            });
                            part++;
                        });
                    })
                    .catch(err => {
                        console.error("Error fetching result files:", err.message);
                    });
            } else {
                console.log(`Operation ended with status: "${data.status}".`);
            }
        })
        .catch(error => {
            console.error("Error during job status check:", error.message);
        });
}

Install Axios Module

Next, install the axios module for file upload. Type the “npm install axios in the Terminal.

PDF.co API Key

On line 7, add your API key inside the double quote. You can get the PDF.co API Key from your dashboard.

PDF.co API Key

Source and Destination Files

  • On line 10, replace the SourceFile variable with the URL to your PDF.
  • You can enter the split search string in line 12.
Source and Destination Files

Run the Program

  • Save your JavaScript file as index.js.
  • Open a terminal, navigate to the project directory, and run: node index.js

How It Works

Initiating an Asynchronous Conversion:

  • The code sends a request to the PDF.co API to start splitting the PDF file pages, with the async option set to true to enable asynchronous processing.
  • The API response contains a job ID and a URL to monitor the conversion status.

Checking Job Status:

  • The code uses the job ID to periodically query the job's status by sending requests to the API.
  • Once the status changes to "success", it indicates that the conversion process is complete, and the result file is ready for download.

Downloading the Result:

  • After the conversion is successfully completed, the code retrieves the split PDF files from the provided URLs and saves them to the specified location on the local system.

In this tutorial, you learned how to split a PDF by text from a URL in JavaScript using PDF.co Web API with asynchronous processing. By taking advantage of async mode, your application can efficiently handle large files without delays. You also discovered how to quickly get started by modifying just a few lines in the sample code.

Related Tutorials

See Related Tutorials