How to Extract Tables Containing Text from PDF using PDF.co and Zapier

Sep 2, 2024·5 Minutes Read

Extract Tables with Text from PDF

In this tutorial, we will show you how to extract tables containing text from PDF using PDF.co and Zapier.

We will use this 2-page sample PDF and extract the table items that span two pages.

Sample Source File
Sample Source File

Make A Zap

First, click on the Make a Zap button in the upper left corner of your dashboard.

Add App Event

Now, select PDF.co as the App Event.

Screenshot of selecting App Event

Add Action Event

Under the Action Event, select the Document Parser to parse invoices and other documents using templates.

Screenshot of adding Action Event

Connect PDF.co Account

In the Choose Account field, select your PDF.co account.

Screenshot of connecting PDF.co account

Document Parser Configuration

Now, we can set up the action.

  • In the Input field, enter the source file link. If you use Google Drive, Dropbox, or another service, set the file-sharing setting to Anyone with a link so the engine can access the file.
  • Under the Template ID field, type in the ID of the document parser template.
Screenshot of Document Parser configuration

Test Document Parser

Let’s send our Document Parser configuration to PDF.co to Test & Review. This is to make sure that we set it up correctly.

Screenshot of testing Document Parser

Test Result

Great! The test was successful. In the inline result, you will see all the table line items from 1 to 45. You can then turn the Zap on.

Screenshot of test result

Create Document Parser Template

We will show you how to create the template to extract a multi-page table.

Load Test PDF

Click the Load Test PDF or Image button to open the PDF document that you will use for your template.

Screenshot of loading test PDF

Name and Regex

You can rename the object in the Name field. Make sure to check the Regex box when using any Text Search objects.

Screenshot of renaming object

Add TABLE Field Based on TEXT SEARCH

Now, click on the Add Object button and select the Add TABLE Field based on TEXT SEARCH.

Screenshot of adding table field

Add Table Object Expression

In the Edit Template window, add the start and end expressions as well as necessary rows and columns. Then, set the multipage to true.

{
  "start": {
    "expression": "Item{{Spaces}}Description{{Spaces}}Price",
    "regex": true
  },
  "end": {
    "expression": "TOTAL{{Spaces}}{{Number}}",
    "regex": true
  },
  "row": {
    "expression": "{{LineStart}}{{Spaces}}(?{{Digits}}){{Spaces}}(?{{SentenceWithSingleSpaces}}){{Spaces}}(?{{Number}}){{Spaces}}(?{{Digits}}){{Spaces}}(?{{Number}})",
    "regex": true
  },
  "columns": [
    {
      "name": "itemNo",
      "dataType": "integer"
    },
    {
      "name": "description",
      "dataType": "string"
    },
    {
      "name": "price",
      "dataType": "decimal"
    },
    {
      "name": "qty",
      "dataType": "integer"
    },
    {
      "name": "extPrice",
      "dataType": "decimal"
    }
  ],
  "multipage": true
}

Then Run the template to make sure there are no errors.

Screenshot of editing template

Parsed Table Output

Here’s the parsed table line items output.

Screenshot of parsed table output

In this tutorial, you learned how to extract tables containing text from PDF using PDF.co and Zapier. You also learned how to create a Document Parser template.

Video Guide

Related Tutorials

See Related Tutorials