Extract Text Info from PDF and Turn into XML File (Neglecting Tables and Images) using PDF.co and Make
We prepared this step-by-step tutorial as your guide on how to extract text info from PDF and turn into XML file using PDF.co and Make.
We will use this sample PDF Invoice and convert it into XML.
Step 1: Create a Scenario
First, click on the Create a Scenario button at the upper right corner of your dashboard.
Step 2: Add Module Google Drive
Next, add the Google Drive module to perform the first step in the scenario.
Step 3: Download a File
In this step, select the Download a File to fetch our sample PDF file from our Google Drive folder.
Step 4: Google Drive Connection
- In the Enter a File ID field, choose the Select from the list option.
- In the File ID field, choose the folder name where the file is stored.
Step 5: Add PDF.co Module
Now, let’s add another module and select the PDF.co app.
Step 6: Convert from PDF
Under the Conversion from PDF category, choose the Convert from PDF to convert PDF pages into structured XML and other supported formats.
Step 7: Configure PDF.co Module
Now, let’s connect our PDF.co Account and configure the module.
- In the Input File field, select the Upload a File option.
- In the Source File field, choose the Google Drive – Download a File.
- In the Convert Type field, select the PDF to XML to turn our PDF to XML.
Let’s Test and Run our scenario to ensure there are no errors in the setup.
Step 8: Test Result
Excellent! The Scenario runs successfully. You can find the file URL in the output to view the result.
Step 9: Source File Output
Here is the sample PDF Invoice after we converted it to XML.
In this tutorial, you learned how to extract text info from PDF and turn it into an XML file using PDF.co and Make. You also learned how to set up the PDF.co Convert From PDF module that converts PDF to XML and other supported formats.