How to Extract PDF to CSV with Blue Prism
In this tutorial, we'll walk through the process of converting a PDF file into CSV format.
IN THIS TUTORIAL
Extract PDF to CSV with Blue Prism
Begin by dragging and dropping an action into the workspace and configuring it; for the name, we'll call it 'PDF to CSV'. Within the Business Object section, choose 'PDF.co Web API Services'. For the action, select 'PDF to CSV'. This setup will be based on the Postman collection that was recently downloaded from the PDF.co website.
How to Start with PDF to CSV Conversion
I'll include a link to the Postman collection below. The URL, which will be in the body format, is ready. Simply copy and paste this URL into the designated section within the Postman Action properties. Additionally, an API key is necessary for this process. I have the API key on hand; I'll copy and paste it accordingly. With the API key now in place, the next step is to establish some output variables.
Convert PDF to CSV with Blue Prism
The process has now been executed successfully. Upon opening the provided URL, the download initiates, and we can see that it includes all the required data. Often, there's a specific need to convert only a certain section or region of a PDF document.
For instance, although we've converted the entire PDF to CSV in this demonstration, in real-world scenarios, we might only need to convert a particular table within the document to CSV. To achieve this, we can specify the desired section using rectangle parameters that define the coordinates of the region we wish to extract.
In this instance, I've downloaded a PDF and will be utilizing the PDF Multitool by BYTESCOUT, which is a complimentary tool offering various PDF-related functionalities. It's particularly handy for tasks like identifying and extracting specific regions within a PDF. I'll include a link to download this tool in the description for those interested in using it. Now, I'm selecting the table and copying its coordinates, which include the x and y positions, along with the width and height. Having copied and pasted these coordinates, they are now set, and we'll proceed to see how the tool performs with these specified parameters.
After initiating a refresh and restarting the process in the studio, we observe that the result has been produced. The output specifically contains only the data from the table or the region that we targeted. Additionally, there's an important aspect to consider when we prefer to receive the result directly rather than through a URL: we need to set the inline flag to True. When this flag is enabled, the output is delivered inline.
Following the execution with the inline flag set, we direct the output to the body variable. Upon examination, it's clear that the data is presented in CSV format, precisely as we configured it. This demonstrates the straightforward process of converting PDF documents into CSV format. Similarly, as we've transformed PDFs to CSV in this instance and to text in a previous session, it's also feasible to convert PDFs into other formats like JSON or XML, showcasing the versatility of this approach.
PDF.co and Blue Prism Integration
Download Blue Prism Plugin for PDF.coNeed help and support? Please use the button below:
Request Support for Blue Prism + PDF.co