In this tutorial, we will show you how to extract a table with multiple subitems using PDF.co Document Parser

  1. Open PDF.co Account
  2. Create New Template
  3. Load Test PDF
  4. Edit Raw Template
  5. Run Template
  6. Parser Output
  7. Save Template and Return

We will use this sample PDF document in this tutorial.

Screenshot of Table with Multiple Subitems
Screenshot of Table with Multiple Subitems

Step 1: Open PDF.co Account

Open PDF.co Account

Step 2: Create New Template

  • Next, click on the New Template button to create a new template.

Create New Template

Step 3: Load Test PDF

  • On your Template Editor, click on the Load Test PDF button to load your source file.

Load Test PDF

Step 4: Edit Raw Template

  • Then, click on the Edit Raw Template button to edit or add your template code and click Update to load the template into the visual editor.

Edit Raw Template
Edit Raw Template

{
"templateName": "template",
"templateVersion": "4",
"templatePriority": 0,
"detectionRules": {
"keywords": []
},
"objects": [
{
"name": "Doc No",
"objectType": "field",
"fieldProperties": {
"fieldType": "macros",
"expression": "Doc No{{Spaces}}(?{{Digits}})",
"regex": true
}
},
{
"name": "Doc Date",
"objectType": "field",
"fieldProperties": {
"fieldType": "macros",
"expression": "Doc Date{{Spaces}}(?{{SmartDate}})",
"regex": true
}
},
{
"name": "TableItems",
"objectType": "table",
"tableProperties": {
"start": {
"expression": "Description{{Spaces}}Quantity{{Spaces}}Price",
"regex": true
},
"end": {
"expression": "{{EndOfPage}}",
"regex": true
},
"row": {
"expression": "{{LineStart}}{{Spaces}}(?{{Digits}}){{Spaces}}(?{{SentenceWithSingleSpaces}}){{Spaces}}(?{{Number}}{{Spaces}}{{Letters}}){{Spaces}}(?{{Number}}{{Spaces}}{{Letters}}/{{Letters}}){{Spaces}}(?{{Number}}{{Spaces}}{{Letters}})",
"regex": true,
"subExpression1": "{{LineStart}}{{Spaces}}(?{{SentenceWithSingleSpaces}})"
},
"columns": [
{
"name": "No",
"dataType": "integer"
},
{
"name": "Description",
"dataType": "string"
},
{
"name": "Quantity",
"dataType": "string"
},
{
"name": "Price",
"dataType": "string"
},
{
"name": "Amount",
"dataType": "string"
}
]
}
}
]
}

Step 5: Run Template

  • After adding the template code, you may now run the template.

Run Template

Step 6: Parser Output

  • Here’s the extracted output from the table with multiple subitems.

Parser Output
Parser Output

Step 7: Save Template and Return

  • Now, click on the Save Template and Return buttons to save the template.

Save Template and Return

In this tutorial, you learned how to extract a table with multiple subitems using PDF.co Document Parser. You also learned how to use the Document Parser Template Editor to create a new template.

Related Pages:

Related Samples: