Extract Tables with Text from PDF – Step-by-Step Guide

  1. Make A Zap
  2. Add App Event
  3. Add Action Event
  4. Connect PDF.co Account
  5. Document Parser Configuration
  6. Test Document Parser
  7. Test Result

In this tutorial, we will show you how to extract tables containing text from PDF using PDF.co and Zapier.

We will use this 2-page sample PDF and extract the table items that span in two pages.

Sample Source File
Sample Source File

Step 1: Make  A Zap

First, click on the Make a Zap button in the upper left corner of your dashboard.

Step 2: Add App Event

Now, select PDF.co as the App Event.

App Event

Step 3: Add Action Event

Under the Action Event, select the Document Parser to parse invoices and other documents using templates.

Document Parser

Step 4: Connect PDF.co Account

In the Choose Account field, select your PDF.co account.

PDF.co Account

Step 5: Document Parser Configuration

Now, we can set up the action.

  • In the Input field, enter the source file link. If you use Google Drive, Dropbox, or another service, set the file sharing setting to Anyone with link so the engine can access the file.
  • Under the Template ID field, type in the ID of the document parser template.

Set Up The Action

Step 6: Test Document Parser

Let’s send our Document Parser configuration to PDF.co to Test & Review. This is to make sure that we set it up correctly.
Test Document Parser

Step 7: Test Result

Great! The test was successful. In the inline result, you will see all the table line items from 1 to 45. You can then turn the Zap on.

Test Result

 

 

Create Document Parser Template

We will show you how to create the template to extract a multi-page table.

  1. Load Test PDF
  2. Name and Regex
  3. Add TABLE Field Based on TEXT SEARCH
  4. Add Table Object Expression
  5. Parsed Table Output

1. Load Test PDF

Click the Load Test PDF or Image button to open the PDF document that you will use for your template.

Load Test PDF

2. Name and Regex

You can rename the object in the Name field. Make sure to check the Regex box when using any Text Search objects.

Total Name

3. Add TABLE Field Based on TEXT SEARCH

Now, click on the Add Object button and select the Add TABLE Field based on TEXT SEARCH.

Add Table Field

4. Add Table Object Expression

In the Edit Template window, add the start and end expressions as well as necessary rows and columns. Then, set the multipage to true.

{
  "start": {
    "expression": "Item{{Spaces}}Description{{Spaces}}Price",
    "regex": true
  },
  "end": {
    "expression": "TOTAL{{Spaces}}{{Number}}",
    "regex": true
  },
  "row": {
    "expression": "{{LineStart}}{{Spaces}}(?{{Digits}}){{Spaces}}(?{{SentenceWithSingleSpaces}}){{Spaces}}(?{{Number}}){{Spaces}}(?{{Digits}}){{Spaces}}(?{{Number}})",
    "regex": true
  },
  "columns": [
    {
      "name": "itemNo",
      "dataType": "integer"
    },
    {
      "name": "description",
      "dataType": "string"
    },
    {
      "name": "price",
      "dataType": "decimal"
    },
    {
      "name": "qty",
      "dataType": "integer"
    },
    {
      "name": "extPrice",
      "dataType": "decimal"
    }
  ],
  "multipage": true
}

Then Run the template to make sure there are no errors.

Edit Table Expression

5. Parsed Table Output

Here’s the parsed table line items output.

Source File Output
Source File Output

In this tutorial, you learned how to extract tables containing text from PDF using PDF.co and Zapier. You also learned how to create a Document Parser template.

Related Pages:

Related Samples: