Extract Text Info from PDF and Turn into XML File (Neglecting Tables and Images) using PDF.co and Make

Sep 10, 2024·4 Minutes Read

We prepared this step-by-step tutorial as your guide on how to extract text info from PDF and turn into XML file using PDF.co and Make.

We will use this sample PDF Invoice and convert it into XML.

IN THIS TUTORIAL

Create a Scenario

Add Module Google Drive

Download a File

Google Drive Connection

Add PDF.co Module

Convert from PDF

Configure PDF.co Module

Test Result

Source File Output

Step 1: Create a Scenario

First, click on the Create a Scenario button at the upper right corner of your dashboard.

Step 2: Add Module Google Drive

Next, add the Google Drive module to perform the first step in the scenario.

Step 3: Download a File

In this step, select the Download a File to fetch our sample PDF file from our Google Drive folder.

Step 4: Google Drive Connection

In the Enter a File ID field, choose the Select from the list option.
In the File ID field, choose the folder name where the file is stored.

Step 5: Add PDF.co Module

Now, let’s add another module and select the PDF.co app.

Step 6: Convert from PDF

Under the Conversion from PDF category, choose the Convert from PDF to convert PDF pages into structured XML and other supported formats.

Step 7: Configure PDF.co Module

Now, let’s connect our PDF.co Account and configure the module.

In the Input File field, select the Upload a File option.
In the Source File field, choose the Google Drive – Download a File.
In the Convert Type field, select the PDF to XML to turn our PDF to XML.

Let’s Test and Run our scenario to ensure there are no errors in the setup.

Step 8: Test Result

Excellent! The Scenario runs successfully. You can find the file URL in the output to view the result.

Step 9: Source File Output

Here is the sample PDF Invoice after we converted it to XML.

In this tutorial, you learned how to extract text info from PDF and turn it into an XML file using PDF.co and Make. You also learned how to set up the PDF.co Convert From PDF module that converts PDF to XML and other supported formats.