How to Split a Multi-Page PDF Invoice and Extract Data using Make

This tutorial will show you how to split a large PDF with 5 invoices into individual files by finding a specific keyword. You'll learn to extract details from each invoice and add them to a Google Spreadsheet using PDF.co API and the Make platform.

Sample PDF Invoice
Sample PDF Invoice

Note

This guide uses Google Sheets, but it can also be helpful for users of other apps like Xero and QuickBooks.

(Using Zapier? Check out this tutorial.)

Here are the steps to automate the process using PDF.co and Make.

Step 1: Google Drive Watch Files in Folder

Let’s start by using the Google Drive “Watch Files in a Folder” option to allow you to monitor changes made to files within a specific folder on your Google Drive. When this feature is enabled for a folder, Google Drive will automatically notify you of any new files added, existing files that have been modified, or files that have been deleted from that folder.

Step 2: Setup Google Drive Configuration

  • Select "By Created Time" to Watch Files.
  • Choose "My Drive" as the Drive to be used.
  • Enter the Folder name that you want to watch.
  • Select "All" for File Types to Watch.
  • Enter the limit value for the number of results to be processed during one execution cycle.
Google Drive Configuration

Step 3: Google Drive Download a File

Add another Google Drive module and select the "Download a File" feature to download the PDF file you need to process.

Step 4: Setting Up Google Drive

  • Choose the "Select from the list" option to enter a File ID.
  • Select "My Drive" as the Drive to be used.
  • Enter the folder and file name you would like to use.
Google Drive Set Up

Step 5: PDF.co Split a PDF Module

Let's use the PDF.co Split a PDF feature to split the PDF invoice into individual invoices based on a text search, after setting up the integration with Google Drive.

Step 6: Split a PDF Configuration

  • Select the 'Upload a file' option to automatically input the downloaded PDF file from Google Drive.
  • Then, choose the 'Split by Text Found or Barcode' feature to split the PDF invoice into individual invoices using a text search.
  • Enter the keyword or text you want to use to split the PDF in the Search String field.
Split a PDF Configuration

Step 7: Run Scenario to Split Invoices

After set up, run the scenario to split the PDF invoice into individual invoices.

Output

Step 8: Iterator Tool

Let's use the Iterator tool to loop through each URL and extract text from each invoice one at a time.

Iterator Tool

Step 9: PDF.co Parse a Document Module

Next let's add the 'PDF.co Parse a Document' module to parse the data from all the individual invoices.

Step 10: Setup Parse a Document Module

Document Parser Configuration

Step 11: Run Scenario to Parse Data

Now, let’s run the scenario to parse the data from each individual invoice.

Parse Data

Step 12: Google Sheets Module

Now that we've parsed the data from all the invoices, let's add the Google Sheets module and map the extracted data to the corresponding fields in Google Sheets.

Step 13: Setup Google Sheets Module

  • Choose the 'Select by path' method.
  • Select 'My Drive' as the drive.
  • Enter the Spreadsheet ID and Sheet name.
  • Map the parsed data values to the respective columns in the Google Sheet.
Google Sheets
Values

Step 14: Run Scenario and View Result

Now, run the scenario and open Google Sheets to review the result.

Google Sheets

In conclusion, this tutorial has shown you how to efficiently split a large PDF with multiple invoices into individual files using a specific keyword. By using the PDF.co API and the Make platform, you can easily extract invoice details and add them to a Google Spreadsheet. This automated process simplifies invoice management, enhances data accuracy, and saves time, allowing you to focus on more important business tasks.