Extract Text Info from PDF and Turn into XML File (Neglecting Tables and Images) using PDF.co and Zapier

Sep 2, 2024·2 Minutes Read

In this tutorial, we will show you how to extract text info from PDF and turn it into an XML file using PDF.co and Zapier.

This is the Sample Source File that we will use for this demonstration.

IN THIS TUTORIAL

Make A Zap

Choose PDF.co App

PDF To Anything Converter

Connect the PDF.co Account

Step 1: Make A Zap

First, click the Make a Zap button at the upper left corner of your dashboard.

Step 2: Choose PDF.co App

Next, select the PDF.co app for the App Event.

Step 3: PDF To Anything Converter

For the Action Event, select the PDF to Anything Converter to convert PDF to JPG, PNG, CSV, JSON, XML, and other formats.

Step 4: Connect the PDF.co Account

Now, let’s connect our PDF.co account to perform the Zap.

Step 5: Set Up the Action

Next, let’s set up the Action and fill out the Output Format, PDF URL, and Pages.

For the Output Format, choose the XML Code because we want to extract text in our PDF and turn into XML.
In the PDF URL field, put the URL of the source PDF document and set the file sharing option to Anyone with link when you use cloud services such as Google Drive, Dropbox, etc.
Under the Pages field, type in 0 for page 1.

Step 6: Test and Review

Now, click the Test and Review button to make sure that there are no errors in our configuration.

Step 7: Test Result

Excellent! Our test was successful. PDF.co returned a URL to so we can view the output. You can now Turn on the Zap.

Step 8: Source File Output

When you open the output URL, the source file output looks like this.

In this tutorial, you learned how to extract text info from PDF and turn it into an XML file using PDF.co and Zapier. You also learned how to set up the PDF to Anything Converter module that supports PDF to Text, XLS, CSV, JSON, XML, and Images formats conversion.