How to Convert PDF to XML Using Automation Anywhere and PDF.co

About Automation Anywhere

Automation Anywhere is a technology company headquartered in San Jose, California, USA. It is a provider of an end-to-end Robotic Process Automation (RPA) and Artificial Intelligence (AI) platform for automating any business process. They offer Discovery Bot for fast-track automation, IQ Bot for transforming unstructured and hidden data with AI that can learn on its own, Bot Insight for real-time analytics, and a Bot Store to get pre-built automation solutions.

We prepared this step by step tutorial with screenshots on how to make a Bot using PDF.co to convert PDF to XML.

In this tutorial, we assume that you have already had a Community or Enterprise version of Automation anywhere with chrome extension as well as client agent already installed. We also assume that you have basic knowledge of how to create a simple bot and have it executed on your machine.

Useful Resources:

Step by Step Guide

This is a step-by-step tutorial on how to convert PDF to XML using Automation Anywhere and PDF.co plugin.

PDF.co API Endpoint

PDF.co API Endpoint for converting PDF to XML. We’ll have PDF.co method documentation handy either by Postman or online documentation at https://developer.pdf.co, so that we can configure API as per our need.

PDF.co API Endpoint

Bot Creation

In Automation Anywhere cloud Control Room, create a new boat for extracting data in XML format from input PDF. We’ll be using “Rest Web Service: Post Method” action to consume PDF.co “/pdf/convert/to/xml” API endpoint. We’ll also have the “Message Box” action attached to the output of the API call. The final bot structure will be as follows.

Bot Creation

URI Configuration

Configure URI to “https://api.pdf.co/v1/pdf/convert/to/text” in “Rest Web Service: Post Method” action.

URI Configuration

Headers Configuration

Create a Header key with the name “x-api-key” with the API key for PDF.co. This API key is main for consuming pdf.co APIs. API keys can be taken from https://pdf.co.

Headers Configuration

Request Input Configuration

Configure input data to the REST web API call. Here, we’ve created very common JSON service request data that can be found below.

{"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-to-xml/sample.pdf", "name": "result.xml"}
Request Input Configuration

Configure REST API Output

Make an output variable of type dictionary that will include the API call response.

Configure REST API Output

Configure Message Box

Drag and Drop the “Message box” action and attach it after the “REST Web Service: Post Method” action. Here, we’ll configure “message to display” with the “Body” property of the output variable.

Configure Message Box

Execute Bot and Review Output

Having these configurations ready, we can now execute the bot and see the output message box. The following shows what our input PDF looks like in this bot.

Input PDF

Bot output message box looks as following.

Message From Bot

If we open the output URL in any text editor, it’ll contain XML representation of input PDF data. In addition to the standard text, output also contains additional properties such as font, page no, data position, etc.

XML Representation of Input PDF Data

Related Tutorials

See Related Tutorials