How to Split a Multi-Page PDF Invoice and Extract Data using Zapier

This tutorial demonstrates using the PDF.co API to split a PDF containing 5 invoices into individual files. It shows how to identify a keyword to split the PDF, extract details from each invoice, and add the data to a Google Spreadsheet using the Zapier platform integration.

Sample PDF Invoice
Sample PDF Invoice

Note

This guide uses Google Sheets, but it can also be helpful for users of other apps like Xero and QuickBooks.

Here are the simple step-by-step instructions to automate the process using PDF.co and Zapier.

Step 1: Google Drive New File in Folder

Use the Google Drive New File in Folder option to select the specific PDF file containing the multiple invoices that you need to process.

Step 2: Setup Google Drive Configuration

  • Select My Google Drive as the drive to be used for this automation.
  • Choose the specific folder within your Google Drive where the PDF file containing the multiple invoices is located.
New File in Folder

Step 3: Test Trigger

Test trigger to access the specific file in your Google Drive folder.

Test Record

Step 4: Add PDF.co App

Add the PDF.co app and the Split PDF Based on Text Search feature to split the PDF invoice into individual invoices.

Split PDF Based on Text Search

Step 5: Setup PDF.co Configuration

  • Set the Source PDF URL property to point to the PDF file you want to process.
  • In the Search String field, enter the unique keyword or text that appears on each page where you want to split the PDF. The splitter will split the document at every page containing this keyword.
Text Search String

Step 6: Test PDF.co Action

Now, let’s proceed to test the action to split the PDF invoice into individual invoices.

Test Action

Step 7: Formatter by Zapier

Let's use the Formatter by Zapier to get the link for each of the split pages.

Formatter by Zapier

Step 8: Setup Formatter Utilities

  • In the Transform field, select Line - item to Text.
  • In the Input field, choose the Urls generated by the PDF Split module.
  • Set the Separator field to ||, which will be used to separate the individual invoice URLs.
Format Action

Step 9: Test Formatter

Let's test the Formatter by Zapier to format the PDF invoice URLs for extraction.

Output URL

Step 10: Looping by Zapier

Let's integrate the Looping by Zapier to our workflow to process each invoice URL individually. We'll use the Create Loop From Text option.

Create Loop

Step 11: Setup Looping from Text

  • In the Values to Loop field, enter the variable name for the invoice URLs. This variable will reference each URL in the loop. In the next text box, select the Output Text from the previous module.
  • Enter the separator || as the Text Delimiter.
Configure Loop From Text

Step 12: Test Looping from Text

Let's test the Looping by Zapier to iterate through each invoice URL.

Test Looping From Text

Step 13: PDF.co Document Parser

Let's add the PDF.co Document Parser feature to extract information from each of these individual invoices.

Document Parser

Step 14: Setup Document Parser

Document Parser

Step 15: Test Document Parser

After setting up the configuration, test the action to extract the data from the PDF invoice.

Test Document Parser

Step 16: Add Google Sheets

Let's integrate the Google Sheets action to save the extracted data from the invoices and select the Create Multiple Spreadsheet Rows option to save the extracted data from each invoice into separate rows.

Google Sheets

Step 17: Setup Google Sheets

  • Specify My Google Drive as the storage location.
  • Choose the target Spreadsheet and Worksheet to save the extracted data.
  • Map the parsed invoice data to the corresponding columns in the Google Sheet
Create Multiple Spreadsheet Rows
Create Multiple Spreadsheet Rows in Google Sheets

Step 18: Test Action and See Result

Now, test the action and then open your Google Sheets spreadsheet to review the result.

In conclusion, this tutorial has shown you how to efficiently split a large PDF with multiple invoices into individual files using a specific keyword. By using the PDF.co API and the Zapier platform, you can easily extract invoice details and add them to a Google Spreadsheet. This automated process simplifies invoice management, enhances data accuracy, and saves time, allowing you to focus on more important business tasks.