Automating Financial Report Extraction with n8n, PDF.co, and Google Drive

Oct 9, 2025·6 Minutes Read

What You'll Build

By the end of this tutorial, you’ll have an automated workflow that:

Monitors your email inbox for new messages with subject financial_report.
Downloads attached PDF files automatically.
Uploads the files to PDF.co for processing.
Searches PDFs for invoices and identifies the correct pages.
Splits the PDFs to extract invoices only.
Parses customer details from the extracted documents.
Merges invoices into a single PDF.
Converts the result into a binary file.
Uploads the final processed invoices into Google Drive for secure storage.

This automation ensures that invoices embedded in bulk financial reports are extracted, cleaned, and saved automatically—no more manual searching or splitting.

Prerequisites

Before you begin, make sure you have:

A PDF.co API Key → Get yours here
An email inbox with IMAP access (e.g., Gmail, Outlook, or custom domain)
A Google Drive account (with OAuth2 credentials for n8n)
An n8n instance (cloud-hosted or self-hosted)

Quick Start Options

Option A: Import and Run Immediately

Import the provided workflow JSON.
Connect your IMAP (email), PDF.co, and Google Drive credentials.
Test with a sample financial_report.pdf.
Activate the workflow.

Option B: Build It Step-by-Step

Follow the detailed steps below to recreate the automation from scratch.

Automation Overview

Here’s what the finished workflow does:

Email Trigger (IMAP) – Watches for new unread emails with subject “financial_report” and downloads attachments.
Upload File to PDF.co – Sends the attachment securely to PDF.co and generates a file URL.
Find Invoices – Scans the PDF for the text Invoice No.
Code Node – Extracts matching page numbers and formats them as CSV.
Split File to Get Invoices – Splits the original PDF into only the invoice pages.
Parse Customer’s Name – Extracts customer details from the invoice.
Merge Files – Combines multiple invoices into one PDF.
Convert to Binary File – Prepares the merged file for Google Drive.
Upload to Google Drive – Saves the final invoices in a secure Google Drive folder.

IN THIS TUTORIAL

Trigger on Financial Reports

Upload File to PDF.co

Search for Invoices

Extract Page Numbers

Split PDF to Extract Invoices

Parse Customer’s Name

Merge Files

Convert to Binary File

Upload to Google Drive

Final End-to-End Test

Step 1: Trigger on Financial Reports

Node: Email Trigger (IMAP)

Settings:

Mailbox Name: INBOX
Action: Mark as Read
Download Attachments: True

Custom Email Rules: ["UNSEEN", ["SUBJECT", "financial_report"]]

Success looks like: When a new/unread email with subject financial_report arrives, the workflow is triggered and the attached PDF is pulled in.

Step 2: Upload File to PDF.co

Node: PDF.co API → Upload File

Settings:

Operation: Upload File to PDF.co
Binary Data: True
Binary Property Name: file0
File Name: ={{ $json.subject }}

Success looks like: The PDF financial report is uploaded to PDF.co and returns a secure file URL.

Step 3: Search for Invoices

Node: PDF.co API → Search in PDF

Settings:

URL: ={{ $json.url }} (from Step 2)
Search String: "Invoice No"

Success looks like: PDF.co returns an array of all pages containing the text Invoice No.

Step 4: Extract Page Numbers

Node: Code (JavaScript)

// Get the body array from the previous node
const body = $input.first().json.body || [];
// Extract pageIndex values
const pageIndexes = body.map(entry => entry.pageIndex);
// Convert to human-readable page numbers (index + 1), remove duplicates, and sort
const pageNumbers = [...new Set(pageIndexes.map(i => i + 1))].sort((a, b) => a - b);
// Return result
return [{
  json: {
    pageNumbers: pageNumbers,             // [1, 4, 5]
    pageNumbersCsv: pageNumbers.join(",") // "1,4,5" for APIs that need string
  }
}];

Logic:

Takes the array of pageIndex values from Step 3.
Converts them into human-readable page numbers (index + 1).
Removes duplicates and sorts them.

Exports results as:

{

"pageNumbers": [1, 4, 5],

"pageNumbersCsv": "1,4,5"

}

Success looks like: Invoice page numbers are extracted and formatted for the Split API.

Step 5: Split PDF to Extract Invoices

Node: PDF.co API → Split PDF

Settings:

URL: ={{ $('Assign URLs to Binary Files').item.json.url }}
Pages: ={{ $json.pageNumbersCsv }}

Success looks like: A new set of PDF files, each containing only invoice pages, is returned.

Step 6: Parse Customer’s Name

Node: PDF.co API → Parse

Settings:

URL: ={{ $json.body[0] }}

Success looks like: Customer information (e.g., company name, contact details) is extracted from the invoice.

Step 7: Merge Files

Node: PDF.co API → Merge PDF

Settings:

Input URLs:

={{ $('Split File to get Invoices').item.json.body[0] }}

={{ $('Split File to get Invoices').item.json.body[1] }}

Output File Name: {{$json.body.vendor.name}}_merged_invoice.pdf

Success looks like: Multiple invoices are combined into a single professional PDF.

Note: Visit the API docs: PDF Merge Endpoint for more options.

Step 8: Convert to Binary File

Node: HTTP Request

Settings:

URL: ={{ $json.url }}
Batch Size: 1
Batch Interval: 500 ms

Success looks like: The merged invoice is converted into binary format for upload.

Step 9: Upload to Google Drive

Node: Google Drive

Settings:

Drive: My Drive
Folder: OUTPUT1 (Google Drive folder link)

Success looks like: The final invoice PDF is saved to your Google Drive folder.

Step 10: Final End-to-End Test

Send an email with subject financial_report and a PDF attachment.
Execute the workflow in n8n.
Check your Google Drive folder → the extracted and merged invoices should be saved.