How to Convert PDF to JSON Using Automation Anywhere and PDF.co

About Automation Anywhere

Automation Anywhere is a technology company headquartered in San Jose, California, USA. It is a provider of an end-to-end Robotic Process Automation (RPA) and Artificial Intelligence (AI) platform for automating any business process. They offer Discovery Bot for fast-track automation, IQ Bot for transforming unstructured and hidden data with AI that can learn on its own, Bot Insight for real-time analytics, and a Bot Store to get pre-built automation solutions.

We prepared this step by step tutorial with screenshots on how to make a Bot using PDF.co to convert PDF to JSON.

In this tutorial, we assume that you have already had a Community or Enterprise version of Automation anywhere with chrome extension as well as client agent already installed. We also assume that you have basic knowledge of how to create a simple bot and have it executed on your machine.

Useful resources:

PDF.co API Endpoint

PDF.co API Endpoint for converting PDF to JSON. We’ll have PDF.co method documentation useful either by Postman or online instructions at https://developer.pdf.co, so that we can configure API as needed.

Screenshot of PDF.co API Endpoint on Postman

Bot Creation

In Automation Anywhere cloud Control Room, create a new boat for extracting JSON format data from PDF. We’ll be using “Rest Web Service: Post Method” action to consume PDF.co “/pdf/convert/to/json” API endpoint. We’ll also have the “Message Box” action attached to the output of the API call. The final bot structure will be as follows.

Screenshot of Automation Anywhere cloud Control Room

URI Configuration

Configure URI to “https://api.pdf.co/v1/pdf/convert/to/text” in “Rest Web Service: Post Method” action.

Screenshot of URI configuration (REST Web Services)

Header Configuration

Configure the Header key with the name “x-api-key” with the API key for PDF.co. This API key is essential for consuming pdf.co APIs. API keys can be obtained from https://pdf.co.

Screenshot of header configuration

Request Input Configuration

Configure input data to the REST web API call. Here, we’ve configured very basic JSON service request data as follows. We’ll be attempting to convert the whole page to JSON format.

{"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-to-json/sample.pdf"}
Screenshot of request input configuration

Configure Rest API Output

Create an output variable of type dictionary which will hold the response of API call.

Screenshot of REST API output configuration

Configure Message Box

Drag and Drop the “Message box” action and attach it after the “REST Web Service: Post Method” action. Here, we’ll configure “message to display” with the “Body” property of the output variable.

Screenshot of Message Box configuration

Execute Bot and Review Output

When these configurations are done, we’re now ready to execute the bot and check the output message box. Before that, let’s analyze the input PDF file that we use in this API request. It will be as follows.

Screenshot of executed bot and output

This is how the bot output message box looks like.

Screenshot of Message from bot

If we open the response URL in any text editor, we will see that it’s having PDF data in JSON format as well as additional helpful properties – font, page no, element position, etc.

Screenshot of PDF data in JSON

Now, you’ve learned how to convert PDF to JSON using Automation Anywhere and PDF.co plugin.