The PDF.co PDF Search API encompasses several search functions. It ranges from searching text, removing and/or replacing text with another text or image, and making the PDF document searchable.

The API is capable of making multiple text replacements, getting text coordinates, employing advanced pattern search using a regular expression, and converting scanned PDF and images into text-searchable PDF.

All the documents transmitted through our API are encrypted and secure. We use SSL, TLS, and file encryption security to protect your data. To learn more, please read our security page https://pdf.co/security

PDF Search API Benefits

Regular Expression Support

The PDF Search API supports advanced pattern search on top of the regular text search. This pattern search employs regular expression or regex to return text or data that only matches the set expression.

Get Text Coordinates

The PDF Search API returns the coordinates of the text search result. This information is very helpful when trying to recreate the PDF or parsing specific data using the extraction or document parsing module.

Scanned PDF and Images Conversion

The PDF Search API converts scanned PDF whether partially or fully made from scanned images into text-searchable PDF. It runs OCR and adds an invisible text layer on top of your document that can be used for text search, text indexing, etc.

Text and Image Replacement Support

The PDF Search API can search text and replace it with either text or image. The search can be narrowed down to specific pages and page range or do a whole document search.

On-Premise API Server

We have an API Server that allows businesses and enterprises to run the PDF.co API modules to their on-premise servers. This gives them more control over file storage and other customizations. To request a free trial, please send an email to our support team.

Sign Up

Secure, Encrypted, and Certified Cloud Storage

When you upload input files for processing with our API, it stores your files securely in the temporary cloud storage for a limited amount of time, and then these files automatically removed. You can also use API to remove both source files and output files permanently at any time.

High-Quality PDF Generation

PDF.co platform converts images and scanned PDF into high-quality PDF files that can be searched for text. Built-in OCR engine supports multiple languages including English, Spanish, German, Chinese, Japanese, and others.

API and Business Automation Platforms Integrations

PDF.co platform can be used by software developers from programming languages such as Javascript, PHP, Java, .NET and ASP.NET, C#, Visual Basic, and many others.

If you are not a developer then you can also easily automate your PDF operations through business automation platforms such as ZapierIntegromat, and hundreds of others.

PDF Search API Integrations

PDF.co API integrates with over 300+ online apps. See below for sample tutorials.

PDF Search API Sample & Demo

In this demonstration, we will find the Invoice Date in a PDF Invoice using a combination of text and regular expression search strings. We will set the inline parameter to true so we view the result in the response body. When you set the inline parameter to false, PDF.co will return a downloadable JSON with the contents of the PDF Search Text API result.

Below are the images of our source PDF Invoice and output.

PDF Search API Sample PDF Invoice And Text Result
Images of sample PDF Invoice and Output

Let’s review the /v1/pdf/find endpoint’s parameters and their corresponding functions.

Endpoint

URL: https://api.pdf.co/v1/pdf/find
Method: POST
Parameter Description
URL required. Link to the source file.
searchString text to search. Can contain a regex.
pages optional. Comma-separated list of page indices (or ranges) to process.
inline optional. Must be one of true, false.
wordMatchingMode optional. Must be a String.
password optional. The password of the PDF file. Must be a String
regexSearch optional. Must be one of true, false.
encrypt optional. Enable encryption to the output file.
async optional. Runs processing asynchronously. Returns jobId to use with job/check: true or false
name optional. Output file name.
profiles optional. Must be a String. Set custom configuration. See profiles examples here

 

cURL Code Snippet

curl --location --request POST 'https://api.pdf.co/v1/pdf/find' \
--header 'x-api-key: {{x-api-key}}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "async": "false",
    "encrypt": "false",
    "url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
    "searchString": "Invoice Date \\d+\\/\\d+\\/\\d+",
    "regexSearch": "true",
    "name": "output",
    "pages": "0-",
    "inline": "true",
    "password": ""
}'

The PDF Search Text API cURL sample codes are available here.

Let’s see the PDF Search Text API in action.

PDF Search Text API Demonstration
PDF Search Text API Demonstration

The PDF Search Text API JavaScript sample codes are available here.

The PDF Search Text API Python sample codes are available here.

The PDF Search Text API Java sample codes are available here.

The PDF Search Text API C# sample codes are available here.

Sign Up

Related Samples: