Extract Text Info from PDF and Turn into XML File (Neglecting Tables and Images) using PDF.co and Make

We prepared this step-by-step tutorial as your guide on how to extract text info from PDF and turn into XML file using PDF.co and Make.

We will use this sample PDF Invoice and convert it into XML.

Sample Source File
Sample Source File

Step 1: Create a Scenario

First, click on the Create a Scenario button at the upper right corner of your dashboard.

Step 2: Add Module Google Drive

Next, add the Google Drive module to perform the first step in the scenario.

Add Module Google Drive

Step 3: Download a File

In this step, select the Download a File to fetch our sample PDF file from our Google Drive folder.

Download a File

Step 4: Google Drive Connection

  • In the Enter a File ID field, choose the Select from the list option.
  • In the File ID field, choose the folder name where the file is stored.
Google Drive Connection

Step 5: Add PDF.co Module

Now, let’s add another module and select the PDF.co app.

Add PDF.co Module

Step 6: Convert from PDF

Under the Conversion from PDF category, choose the Convert from PDF to convert PDF pages into structured XML and other supported formats.

Convert from PDF

Step 7: Configure PDF.co Module

Now, let’s connect our PDF.co Account and configure the module.

  • In the Input File field, select the Upload a File option.
  • In the Source File field, choose the Google Drive – Download a File.
  • In the Convert Type field, select the PDF to XML to turn our PDF to XML.

Let’s Test and Run our scenario to ensure there are no errors in the setup.

Configure PDF.co Module

Step 8: Test Result

Excellent! The Scenario runs successfully. You can find the file URL in the output to view the result.

Test Result

Step 9: Source File Output

Here is the sample PDF Invoice after we converted it to XML.

Source File Output
Source File Output

In this tutorial, you learned how to extract text info from PDF and turn it into an XML file using PDF.co and Make. You also learned how to set up the PDF.co Convert From PDF module that converts PDF to XML and other supported formats.

Video Guide