Split PDF by Text in JavaScript using PDF.co Web API
In this tutorial, we will show you how to split a PDF by text from a URL in JavaScript using PDF.co Web API, leveraging asynchronous processing.This approach is particularly beneficial for processing large PDF files.
Benefits of Asynchronous Mode
- Avoids Timeouts: Large files or complex conversions may take time. Asynchronous mode prevents your application from being blocked or timing out during these processes.
- Optimized Workflow: Your application can continue executing other tasks while the server processes the PDF.
- Scalable: Asynchronous processing is ideal for applications handling multiple file conversions simultaneously.
For more details on asynchronous processing, refer to the PDF.co Asynchronous Processing Documentation.
Source File
Below is the image of the 3-page PDF input and the three individual pages of PDF output files.
Output Files
Step-by-Step Guide
Source Code
Open your text editor and create a JavaScript file with the following code:
const https = require("https");
const fs = require("fs");
const axios = require("axios");
// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
const API_KEY = "********************";
// Source PDF file to split
const SourceFileUrl = "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-split/multiple-invoices.pdf";
// Split Search String
const SplitText = "invoice number";
// Prepare request to `Split PDF By Text` API endpoint
const queryPath = "https://api.pdf.co/v1/pdf/split2";
// JSON payload for API request
const jsonPayload = {
searchString: SplitText,
url: SourceFileUrl,
async: true
};
// Send POST request using axios
axios.post(queryPath, jsonPayload, {
headers: {
"x-api-key": API_KEY,
"Content-Type": "application/json"
}
})
.then(response => {
const data = response.data;
if (!data.error) {
console.log(`Job #${data.jobId} has been created!`);
checkIfJobIsCompleted(data.jobId, data.url);
} else {
// Service reported an error
console.log(data.message);
}
})
.catch(error => {
console.error("Error during API request:", error.message);
});
function checkIfJobIsCompleted(jobId, resultFileUrlJson) {
const queryPath = "https://api.pdf.co/v1/job/check";
// JSON payload for API request
const jsonPayload = {
jobid: jobId
};
// Send POST request using axios
axios.post(queryPath, jsonPayload, {
headers: {
"x-api-key": API_KEY,
"Content-Type": "application/json"
}
})
.then(response => {
const data = response.data;
console.log(`Checking Job #${jobId}, Status: ${data.status}, Time: ${new Date().toLocaleString()}`);
if (data.status === "working") {
// Check again after 3 seconds
setTimeout(() => {
checkIfJobIsCompleted(jobId, resultFileUrlJson);
}, 3000);
} else if (data.status === "success") {
// Fetch the result files
axios.get(resultFileUrlJson)
.then(res => {
const respJsonFileArray = res.data;
let part = 1;
respJsonFileArray.forEach((url) => {
const localFileName = `./part${part}.pdf`;
const file = fs.createWriteStream(localFileName);
https.get(url, (response2) => {
response2.pipe(file).on("close", () => {
console.log(`Generated PDF file saved as "${localFileName} file."`);
});
});
part++;
});
})
.catch(err => {
console.error("Error fetching result files:", err.message);
});
} else {
console.log(`Operation ended with status: "${data.status}".`);
}
})
.catch(error => {
console.error("Error during job status check:", error.message);
});
}
Install Axios Module
Next, install the axios module for file upload. Type the “npm install axios
” in the Terminal.
PDF.co API Key
On line 7, add your API key inside the double quote. You can get the PDF.co API Key from your dashboard.
Source and Destination Files
- On line 10, replace the
SourceFile
variable with the URL to your PDF. - You can enter the split search string in line 12.
Run the Program
- Save your JavaScript file as
index.js
. - Open a terminal, navigate to the project directory, and run:
node index.js
How It Works
Initiating an Asynchronous Conversion:
- The code sends a request to the PDF.co API to start splitting the PDF file pages, with the async option set to
true
to enable asynchronous processing. - The API response contains a job ID and a URL to monitor the conversion status.
Checking Job Status:
- The code uses the job ID to periodically query the job's status by sending requests to the API.
- Once the status changes to "
success
", it indicates that the conversion process is complete, and the result file is ready for download.
Downloading the Result:
- After the conversion is successfully completed, the code retrieves the split PDF files from the provided URLs and saves them to the specified location on the local system.
In this tutorial, you learned how to split a PDF by text from a URL in JavaScript using PDF.co Web API with asynchronous processing. By taking advantage of async mode, your application can efficiently handle large files without delays. You also discovered how to quickly get started by modifying just a few lines in the sample code.