How to Convert PDF to CSV with UiPath
In this session primarily we will be looking at PDF to CSV activity. This activity, as the name suggests, converts the pages to CSV format. It can convert either one PDF file or certain pages or some particular region to CSV format based on the configuration we have made.
We can provide a PDF file, either in the online URL format or in the absolute path format. Once it has been provided we can either provide different configurations. For example, we can provide the extraction region, pages, and passwords. The extraction region is basically the coordinates of the area which we want to extract text and convert as CSV.
Pages indicate a number of pages that we want to consider for the extraction. If our input PDF is password protected then we can provide a password here. Some other properties also here, like if we want to utilize the extractor CSV text into some other variable, then we can create the variable in this property. This property will contain the result file URL of the converted CSV.
Let’s see this in the demo. I am having one sample invoice file and trying converting this PDF to CSV format. For that, I’m taking this absolute file in JSON and also providing the destination path. In the output path, it will be expected like result.csv.
We will create the variable text in the output box by right click and name this as ExtractedCSV. Once this is set, it has a connected message box here and in the message box, we will provide the variable, which we just created. Now after we run this, on the finished completion of this process, we shall see CSV converted and the message box with the CSV text. This message box contains all the data in the CSV format and we also get the file created here. If we open this file, it contains all the data in the CSV format.
Now try something extra with this. We just need to convert the PDF file and its data in the CSV format. What about if you want to get only the table content converted? As we already see, like this extraction region, by providing the coordinates, we can filter out the region, which we want to convert.
So to get the coordinates we will use the PDF Multitool which is a freeware provided by the BYTESCOUT. If you want this program, you can google it out and install it. I’m opening the PDF and if I select the region, at the top, I’m seeing the coordinates. Click and it is copied to my clipboard.
Now go to the extraction region and paste it here. Clear the previous output so that we can begin the new output here and click on run.
Now the execution is completed and we can see the message box only containing the CSV data for the table which we provided. When we open the output file, we can see it is only contained a table within the extraction region. This is how we can utilize the PDF to CSV activity.
UiPath and PDF.co integration
To start, please use the button below: