How to Search, Replace and Delete Text in PDF using PDF.co Web API

The tutorial and the sample source code elaborate on searching, replacing, and deleting the text from a PDF document using PDF.co Web APIs in JavaScript (Node js). The users can use this web API to replace or delete any text or multiple words from a PDF file.

This tutorial utilizes two endpoints to complete the tasks, i.e. replace and delete text. Therefore, the users will go through both endpoints separately.

How to Search and Replace Text

The users can use PDF.co’s ‘replace-text’ endpoint to search and replace any text or their choice in a PDF document, uploaded on the cloud or available locally on the system. Moreover, the endpoints provide the options to replace multiple or single words from a document and create a new PDF document with the replaced texts.

Endpoint Parameters

Following are the endpoint parameters for single-word replacement:

1. url

It is a required parameter which should be a string containing the source PDF document’s URL. The PDF.co API supports the URL links of Google Drive, Dropbox, and from its built-in storage where users can upload their source files.

2. cache

It is an optional parameter to enable or disable built-in URL caching. It is set to false by default.

3. httpusername

It is an optional parameter that the users must provide if the source URL requires the http auth user name to access the source file.

4. httppassword

It is an optional parameter that the users must provide if the source URL requires the HTTP password to access the source file.

5. searchString

It is a required parameter: a string containing the single word that the user wants to replace.

6. replaceString

It is a required parameter: a string containing the single word to be replaced with.

7. caseSensitive

It is an optional parameter to configure case-sensitive or case-insensitive search. It is set to “false” by default for case insensitive search.

8. name

It is an optional parameter: a string containing the name of the output file.

9. expiration

It is an optional parameter defining the output file’s link expiration in minutes. It is set to 60 minutes by default, and the users can set it to different periods depending on their subscription plan, and the files get automatically deleted from the cloud after this period. However, the users can permanently store them using the PDF.co built-in files storage system.

10. password

It is an optional parameter: a string containing the password to the source file if it is password protected.

11. pages

It is an optional parameter containing the comma-separated list of the page numbers that the API has to process.

12. encrypt

It is an optional parameter to enable or disable the encryption of the output file. It is set to “False” by default.

13. async

It is an optional parameter to run the processing asynchronously. It is set to “false” by default.

14. profiles

It is an optional parameter, a string that allows the users to set custom configurations.

Multiple Replacements

The users can opt to replace multiple words from a PDF file using “searchStrings[]” and “replaceStrings[]” parameters instead of “searchString” and “replaceString.” These parameters are string arrays containing multiple words separated by commas. Below is an example of a basic string array:

"searchStrings": [
        “Solar System",
        "Galaxy, Milky way”
    ]

Example using Javascript

The following source code explains to users how to search and replace a single word from a PDF file using PDF.co web API. This code takes the “Martian Dichotomy” file, hosted on Bytescout, as the source PDF file and replaces the text in it. Users can add their file and the words to replace and be replaced with, in the below code, to replace words in their PDF files.

The user needs to provide an API key generated by the PDF.co login, pdf file URL, and the search and replace words in the API request for the API to work. Moreover, the pdf.co API returns the resulting or edited file URL from which the user can download the file and keep it on his local storage. The result.pdf file contains the copy of the source PDF file with the replaced words, and the users can download the file using file stream modules in the javascript.

Source Code

var request = require('request');
var API_KEY = '*********************'
function searchReplace(){
    var options = {
        'method': 'POST',
        'url': 'https://api.pdf.co/v1/pdf/edit/replace-text',
        'headers': {
          'x-api-key': API_KEY
        },
        formData: {
          'url': 'https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-split/sample.pdf',
          'name': 'result.pdf',
          'searchString': 'Martian',
          'replaceString': 'Jupiternonians',
          'caseSensitive': 'true'
        }
      };
      request(options, function (error, response) {
        if (error) throw new Error(error);
        console.log(response.body)
      });
}
searchReplace()

Step-by-Step Guide

  1. Install the “request” module using “npm install request” and import it into the code.

  2. Declare and initialize the “API_KEY” variable with the API key obtained from the PDF.co dashboard or sign in to get the API key. The users can create an env file to access it in each file and hide it.

  3. Create a function named “searchReplace”. It is up to the users if they want to do functional programming. Otherwise, they can write the code without making any function, making it an optional step.

  4. Declare a variable named “options” and initialize it with the API options such as method and headers.

  5. Then declare another variable named “formData” and initialize it with the parameters the users want to add. For instance, URL, name, searchString, and replaceString.

  6. The final step is to call the function and see the output in the terminal.

Output Document

Below are the screenshots of the code output and the output file obtained in the API response:

Output Document
PDF Editor Web API

It is important to note here that the API replaces a word with another word but does not make space for the new word. Instead, it writes it in the available space, overwriting the next words in the file if the new word’s length is greater than the replaced word.

How to Search and Delete Text

The users can use PDF.co’s ‘replace-text’ endpoint to search and delete any unwanted text from a PDF document uploaded on the cloud or available locally on the system. Moreover, the endpoints provide the options to delete multiple or single words from a document and create a new PDF document without the unwanted text.

Endpoint Parameters

Following are the endpoint parameters:

  1. url

  2. cache

  3. httpusername

  4. httppassword

  5. searchString or searchStrings[]

  6. caseSensitive

  7. name

  8. expiration

  9. password

  10. pages

  11. encrypt

  12. async

  13. profiles

These parameters are the same as the endpoints above, except this endpoint does not require any “replaceString.”

Example Using Javascript

The following source code explains to users how to search and delete a single word from a PDF file using PDF.co web API. This code takes the output file generated in the above example as its source file and deletes the replaced word from it. For instance, “Jupiternonians” is the word in this example. The users can add their files and the words in the below code to delete text from their PDF files.

Source Code

var request = require('request');
var API_KEY = '***************'
function searchDelete(){
    var options = {
        'method': 'POST',
        'url': 'https://api.pdf.co/v1/pdf/edit/delete-text',
        'headers': {
          'x-api-key': API_KEY
        },
        formData: {
          'url': 'https://pdf-temp-files.s3.amazonaws.com/82610017d8bb460f87f3391612e97db9/result.pdf',
          'name': 'result.pdf',
          'searchString': 'Jupiternonians',
          'caseSensitive': 'true'
        }
      };   

      request(options, function (error, response) {
        if (error) throw new Error(error);
        console.log(response.body)
      });
}

searchDelete()

Output Document

Below are the screenshots of the code output and the output file obtained in the API response:

Output Document
Search and Delete Text from PDF

In this tutorial, using a source code you learned how to search, replace, and delete the text from a PDF document using PDF.co Web APIs in JavaScript (Node js).