How to Extract Tables Containing Text from PDF using PDF.co and Make
In this tutorial, we will show you how to extract tables containing text from PDF using PDF.co and Make.
We will use this 2-page sample PDF and extract the table items that span in two pages.
![Sample Source File](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2Fd4cf2300fd173f7b2fa34c005be806e7911ece61-649x659.png&w=1920&q=75)
IN THIS TUTORIAL
Step 1: Create A Scenario
First, click on the Create A Scenario button in the upper right corner of your dashboard.
Step 2: Google Drive App
Now, you will Create a New Scenario, click the Google Drive App as our App event.
![Google Drive App](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F3b5f8969479cc6929a3c110bb9bcd1ad82686dc7-1365x659.png&w=3840&q=75)
Step 3: Download a File
Under Google Drive, select Download a File so we can use the file in the PDF.co module.
![Download a File](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F635c0c7cc822929a0b60eea39e9d231b6324e015-1323x659.png&w=3840&q=75)
Step 4: Google Drive Connection
Under Google Drive Connection, we will create a connection to Google Drive.
- In the Enter a File ID field, choose the Select from the list to fill out your File ID.
- For the Choose a Drive field, select the My Drive.
- Under the File ID field, choose the folder where your file resides.
![Google Drive Connection](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F4e1d9ccafeebd9d0ed5f766a17ccbc16d61e0dff-1357x647.png&w=3840&q=75)
Step 5: PDF.co App
Let’s add another module. This time select PDF.co so we can call the module to extract table contents.
![PDF.co App](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2Fdf95981e050087b53a68833a4250b2d7aaf0520c-1365x661.png&w=3840&q=75)
Step 6: Parse a Document
Under PDF.co, select the Parse a Document module to automatically read invoices, reports, orders, statements, and other documents with fields and tables.
![Parse a Document](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2Fd9bec02335c98fd7672e14493bb17f0f5dd867dc-1361x659.png&w=3840&q=75)
Step 7: Configure PDF.co Module
Under PDF.co Connection, you will add your PDF.co connection.
- In the Input File field, select the Import a file from the URL.
- Under the URL field, input the URL of the source PDF, image, or document.
- Enter the template ID in the Document Parser Template ID to parse a document in your file.
![Configure PDF.co Module](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2Fd81e577f8660e8e13f0d7c7e751c5fc9e3e1f042-1357x665.png&w=3840&q=75)
Step 8: Parsed Table Output
Great! The test was successful. In the inline result, you will see all the table line items from 1 to 45.
![Parsed Table Output](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F445632ce89b8de72af50c27f6ce9bc24f9ac2294-1363x657.png&w=3840&q=75)
Step 9: Create Document Parser Template
We will show you how to create the template to extract a multi-page table.
Step 1: Load Test PDF
Click the Load Test PDF or Image button to open the PDF document that you will use for your template.
![Load Test PDF](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2Fab2bb8faee87ab59a96896ff9d0644343cb73272-1359x603.png&w=3840&q=75)
Step 2: Add FIELD based on TEXT SEARCH
Next, click on the Add Object button on the upper left corner and select Add FIELD based on TEXT SEARCH.
![Add FIELD based on TEXT SEARCH](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F3fe16c01213511645d2a05f173785b917d5a4858-1363x655.png&w=3840&q=75)
Step 3: Name and Regex
You can rename the object in the Name field. Make sure to check the Regex box when using any Text Search objects.
![Name and Regex](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F7deb70978b7bb7273a75a99fcd61564b997bb53a-1359x653.png&w=3840&q=75)
Step 4: Add Total Object Expression
You can either add expressions in the Expression field or directly in the Edit Template window.
Use the expression TOTAL{{Spaces}}({{Number}})
to parse the Total.
![Add Total Object Expression](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F6052943fd1e7dfc1083f8e26ac4d004a3d3f807e-1366x768.png&w=3840&q=75)
Step 5: Add TABLE Field Based on TEXT SEARCH
Now, click on the Add Object button and select the Add TABLE Field based on TEXT SEARCH.
![Add TABLE Field Based on TEXT SEARCH](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F21fd69959b9bda2f15204fc56aec62f552f777a3-1359x659.png&w=3840&q=75)
Step 6: Add Table Object Expression
In the Edit Template window, add the start and end expressions as well as necessary rows and columns. Then, set the multipage to true.
{
"start": {
"expression": "Item{{Spaces}}Description{{Spaces}}Price",
"regex": true
},
"end": {
"expression": "TOTAL{{Spaces}}{{Number}}",
"regex": true
},
"row": {
"expression": "{{LineStart}}{{Spaces}}(?{{Digits}}){{Spaces}}(?{{SentenceWithSingleSpaces}}){{Spaces}}(?{{Number}}){{Spaces}}(?{{Digits}}){{Spaces}}(?{{Number}})",
"regex": true
},
"columns": [
{
"name": "itemNo",
"dataType": "integer"
},
{
"name": "description",
"dataType": "string"
},
{
"name": "price",
"dataType": "decimal"
},
{
"name": "qty",
"dataType": "integer"
},
{
"name": "extPrice",
"dataType": "decimal"
}
],
"multipage": true
}
Then Run the template to make sure there are no errors.
![Add Table Object Expression](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F03db7594cad2fd4bd042ad598194b67671b4ae2e-1360x654.png&w=3840&q=75)
Step 7: Parsed Table Output
Here’s the parsed table line items output.
![Source File Output](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2Fa4e889bc8e841a9fa6c0edd54d2659b331b62d2d-1361x661.png&w=3840&q=75)
In this tutorial, you learned how to extract tables containing text from PDF using the PDF.co Parse a Document module in Integromat. You also learned how to create a Document Parser template.
Video Guide
Related Tutorials
![Tutorial default thumbnail](/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ftutorial-placeholder.de8908df.png&w=3840&q=75)
![Tutorial default thumbnail](/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ftutorial-placeholder.de8908df.png&w=3840&q=75)
![Tutorial default thumbnail](/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ftutorial-placeholder.de8908df.png&w=3840&q=75)