Note: Visit the API docs: PDF Merge Endpoint for more options.
Automating Financial Report Extraction with n8n, PDF.co, and Google Drive
What You'll Build
By the end of this tutorial, you’ll have an automated workflow that:
- Monitors your email inbox for new messages with subject
financial_report
. - Downloads attached PDF files automatically.
- Uploads the files to PDF.co for processing.
- Searches PDFs for invoices and identifies the correct pages.
- Splits the PDFs to extract invoices only.
- Parses customer details from the extracted documents.
- Merges invoices into a single PDF.
- Converts the result into a binary file.
- Uploads the final processed invoices into Google Drive for secure storage.
This automation ensures that invoices embedded in bulk financial reports are extracted, cleaned, and saved automatically—no more manual searching or splitting.
Prerequisites
Before you begin, make sure you have:
- A PDF.co API Key → Get yours here
- An email inbox with IMAP access (e.g., Gmail, Outlook, or custom domain)
- A Google Drive account (with OAuth2 credentials for n8n)
- An n8n instance (cloud-hosted or self-hosted)
Quick Start Options
Option A: Import and Run Immediately
- Import the provided workflow JSON.
- Connect your IMAP (email), PDF.co, and Google Drive credentials.
- Test with a sample financial_report.pdf.
- Activate the workflow.
Option B: Build It Step-by-Step
Follow the detailed steps below to recreate the automation from scratch.
Automation Overview
Here’s what the finished workflow does:
- Email Trigger (IMAP) – Watches for new unread emails with subject “financial_report” and downloads attachments.
- Upload File to PDF.co – Sends the attachment securely to PDF.co and generates a file URL.
- Find Invoices – Scans the PDF for the text
Invoice No
. - Code Node – Extracts matching page numbers and formats them as CSV.
- Split File to Get Invoices – Splits the original PDF into only the invoice pages.
- Parse Customer’s Name – Extracts customer details from the invoice.
- Merge Files – Combines multiple invoices into one PDF.
- Convert to Binary File – Prepares the merged file for Google Drive.
- Upload to Google Drive – Saves the final invoices in a secure Google Drive folder.
Step 1: Trigger on Financial Reports
Node: Email Trigger (IMAP)
Settings:
- Mailbox Name:
INBOX
- Action:
Mark as Read
- Download Attachments:
True
Custom Email Rules: ["UNSEEN", ["SUBJECT", "financial_report"]]
Success looks like: When a new/unread email with subject financial_report
arrives, the workflow is triggered and the attached PDF is pulled in.
Step 2: Upload File to PDF.co
Node: PDF.co API → Upload File
Settings:
- Operation:
Upload File to PDF.co
- Binary Data:
True
- Binary Property Name:
file0
- File Name:
={{ $json.subject }}
Success looks like: The PDF financial report is uploaded to PDF.co and returns a secure file URL.
Step 3: Search for Invoices
Node: PDF.co API → Search in PDF
Settings:
- URL:
={{ $json.url }}
(from Step 2) - Search String:
"Invoice No"
Success looks like: PDF.co returns an array of all pages containing the text Invoice No
.
Step 4: Extract Page Numbers
Node: Code (JavaScript)
// Get the body array from the previous node
const body = $input.first().json.body || [];
// Extract pageIndex values
const pageIndexes = body.map(entry => entry.pageIndex);
// Convert to human-readable page numbers (index + 1), remove duplicates, and sort
const pageNumbers = [...new Set(pageIndexes.map(i => i + 1))].sort((a, b) => a - b);
// Return result
return [{
json: {
pageNumbers: pageNumbers, // [1, 4, 5]
pageNumbersCsv: pageNumbers.join(",") // "1,4,5" for APIs that need string
}
}];
Logic:
- Takes the array of
pageIndex
values from Step 3. - Converts them into human-readable page numbers (index + 1).
- Removes duplicates and sorts them.
Exports results as:
{
"pageNumbers": [1, 4, 5],
"pageNumbersCsv": "1,4,5"
}
Success looks like: Invoice page numbers are extracted and formatted for the Split API.
Step 5: Split PDF to Extract Invoices
Node: PDF.co API → Split PDF
Settings:
- URL:
={{ $('Assign URLs to Binary Files').item.json.url }}
- Pages:
={{ $json.pageNumbersCsv }}
Success looks like: A new set of PDF files, each containing only invoice pages, is returned.
Step 6: Parse Customer’s Name
Node: PDF.co API → Parse
Settings:
- URL:
={{ $json.body[0] }}
Success looks like: Customer information (e.g., company name, contact details) is extracted from the invoice.
Step 7: Merge Files
Node: PDF.co API → Merge PDF
Settings:
Input URLs:
={{ $('Split File to get Invoices').item.json.body[0] }}
={{ $('Split File to get Invoices').item.json.body[1] }}
Output File Name: {{$json.body.vendor.name}}_merged_invoice.pdf
Success looks like: Multiple invoices are combined into a single professional PDF.
Step 8: Convert to Binary File
Node: HTTP Request
Settings:
- URL:
={{ $json.url }}
- Batch Size:
1
- Batch Interval:
500 ms
Success looks like: The merged invoice is converted into binary format for upload.
Step 9: Upload to Google Drive
Node: Google Drive
Settings:
- Drive:
My Drive
- Folder:
OUTPUT1
(Google Drive folder link)
Success looks like: The final invoice PDF is saved to your Google Drive folder.
Step 10: Final End-to-End Test
- Send an email with subject
financial_report
and a PDF attachment. - Execute the workflow in n8n.
- Check your Google Drive folder → the extracted and merged invoices should be saved.
Congratulations!
You’ve successfully automated financial report invoice extraction. Your workflow now:
- Captures financial reports from email.
- Finds invoice pages inside PDFs.
- Extracts customer data.
- Splits & merges invoices
- Saves the result to Google Drive.
This setup can be extended further—for example, sending extracted invoice data to Airtable, QuickBooks, or Slack for notifications.
Built something cool? Share it with us on Twitter @pdfdotco!
Related Tutorials



