How to Extract PDF to CSV with Blue Prism

In this tutorial, we will be reviewing how to convert PDF to CSV format. Drag and drop one Action here and configure these, in the name option, let me name it PDF to CSV. In the Business Object, we will be selecting PDF.co Web API Services. In the action, we will be selecting PDF to CSV and filling this based on the Postman collection which I just downloaded from the PDF.co website.

  1. How to Start with PDF to CSV Conversion
  2. Convert PDF to CSV with Blue Prism

1. How to Start with PDF to CSV Conversion

I will put a link with a Postman collection in the description. I have the URL link in the body format. Copy and paste the URL and fill it in the Postman Action properties. Here one more parameter that will be required is the API key. I already have the API key here. Copy and paste it. The API key has been added. Now define some output variables.

PDF to CSV with BluePrism

In the output, first of all, configure the URL which is Result_Url, and create the variable for it. In the same way, create a variable for Body which is Result_body. We have provided minimum inputs and outputs and we will be reviewing how it is performing. Draw a line to connect here and first of all, let’s see what the input PDF looks like. It is containing the invoice information and we want all the PDF data to be converted to CSV.

2. Convert PDF to CSV with Blue Prism

Now the process has been executed successfully. Open the URL and we’ll see what it is containing. It’s been downloaded and we can see that it is containing all the data. Most often we have the requirement like we want a certain amount of region to be converted.

For example, in this document, we processed the whole PDF and converted it to a CSV but in real-world scenarios, like the recovery is like we just want this table to be converted to CSV. For that, we can provide these by using rectangle parameters with the coordinates of the region which we want to extract.

I just downloaded a PDF here and BYTESCOUT is providing one free tool like the PDF Multitool. It is very useful or to do many things actually for the PDF related. But in this case, we will be using it to get the upper region. By the way, I will include the link to download this tool in the description. If you want you can use it. Now I am selecting the table and copying the coordinates. It’s an x and y coordinate then width and height. Now it’s been copied and pasted. So coordinates have been provided and let’s see how it is performing this time.

Convert PDF to CSV with BluePrism

I am refreshing and starting it again in the process studio. We can see the result has been generated. The output result is containing only the table data or region we have needed. One more thing, in case, when we want to get the direct result instead of the URL, that we have to enable the inline flag to True. If it is inline then the output will be received inline. Now execute it and we already draw the output body to the body variable.

Let’s see what it is containing. So it is containing all the data in the CSV format just like we configured it. That’s how it’s easy to convert PDF to CSV format. In the same way, we have converted from PDF to CSV and in the previous session, we converted PDF to Text. We can do the PDF to JSON or XML.

PDF.co and Blue Prism Integration

Download BluePrism plugin for PDF.co

Need help and support? Please use the button below:

Request Support for BluePrism+PDF.co

 

Blue Prism Extension Videos