Automate OCR and Document Type Detection for Scanned PDFs

Jul 31, 2025·4 Minutes Read

What You'll Have When Done: A setup that can automatically run OCR in scanned PDFs, identify the type of document (like an invoice or purchase order), and sort the files into the right folders. Useful for organizing common business documents without manual effort.

Prerequisites

Before you begin, make sure you have:

A PDF.co API Key (Get yours here)
Google Drive OAuth2 credentials configured in n8n
An n8n instance (cloud or self-hosted)
A designated "incoming" folder in Google Drive for new scanned files
Pre-created destination folders matching your classification categories (Invoice, Purchase Order, Bill, etc.)

Quick Start Options

Option A: I Want It Working Now

Import this workflow template → Download JSON File
Connect your Google Drive account in n8n
Add your PDF.co API key
Set up your folder structure and classification rules
Configure the watched folder
Test with sample scanned files
Activate and let it run

Option B: I Want to Build It Step-by-Step

Follow the 5-step guide below to create the automation from scratch.

What This Automation Does (Overview)

Monitors a specific Google Drive folder for new scanned PDF files
Performs OCR on scanned documents to make them searchable
Classifies each document using PDF.co's Document Classifier
Searches for the appropriate destination folder based on classification
Uploads the processed, searchable file to the correct organized folder

IN THIS TUTORIAL

Monitor for New Scanned Files

Perform OCR on Scanned Document

Classify the OCR-Processed Document

Find the Destination Folder

Download the Processed File

Upload File to Correct Folder

Step 1: Monitor for New Scanned Files

Node: Google Drive Trigger

Settings:

Trigger On: Changes Involving a Specific Folder
Folder from List: Select your "incoming" or "scanned documents" folder
Watch For: File Created

Success Looks Like: The trigger activates immediately when a new scanned file is added to your watched folder.

Note: Make sure the scanned PDF is shared with “Anyone with the link” so PDF.co can access it for processing.

Step 2: Perform OCR on Scanned Document

Node: PDF.co API → Make PDF Searchable or Unsearchable

Settings:

URL: ={{ $json.webContentLink }}
Operation: Make PDF Searchable
Language: English (Adjust for your documents)
Advanced Options:
- File Name: ={{ $json.originalFilename }}

What This Does: Applies OCR to scanned, image-based PDFs and adds an invisible text layer, making the document fully searchable and selectable without altering its visual appearance.

Success Looks Like: PDF.co processes the document and returns a URL to the searchable version. The response includes the processed file URL that can be used for classification.

Note: To learn more about the PDF Make Searchable API, visit our API Docs.

Step 3: Classify the OCR-Processed Document

Node: HTTP Request → PDF.co Document Classifier

Settings:

Method: POST
URL: https://api.pdf.co/v1/pdf/classifier
Headers:
- x-api-key: Your PDF.co API key
- Content-Type: application/json
Body (JSON):

{
  "url": "{{ $json.url }}",
  "rulescsv": "Invoice, OR, Invoice Number, Invoice #, INVOICE NO\nPurchase Order, OR, PO Number, Order Number, Order No\nBill, OR, Bill Date, Billing Period, Bill Number"
}

Customize Your Classification Rules

The rulescsv parameter defines how documents are classified. Format: FolderName, OR, keyword1, keyword2, keyword3

Examples:

Invoice, OR, Invoice Number, Invoice #, INVOICE NO
Purchase Order, OR, PO Number, Order Number, Purchase Order No
Receipt, OR, Receipt, Transaction, Purchase Date
Contract, OR, Agreement, Contract, Terms and Conditions
Statement, OR, Statement, Account Statement, Monthly Statement

Success Looks Like: PDF.co analyzes the OCR'd document content and returns a classification result with the matched category name. The response includes a classes array with the best match.

Note: To learn more about the Document Classifier API, visit our API Docs.

Step 4: Find the Destination Folder

Node: Google Drive → Search files and folders

Settings:

Resource: File/Folder
Operation: Search
Search Query: ={{ $json.body.classes[0].class }}
Filter:
- Folder From List: Your main organizational folder (parent of all category folders)
- What to Search: Folders
Return All: Yes

Success Looks Like: The search returns the folder that matches the classification result. If "Invoice" was classified, it finds your "Invoice" folder.

Tip: Ensure your folder names exactly match the classification categories you defined in Step 3.

Step 5: Download the Processed File

Node: HTTP Request (Download OCR'd File)

Settings:

Method: GET
URL: ={{ $('PDFco Api').item.json.url }}

What This Does: Downloads the OCR-processed, searchable PDF file so it can be uploaded to the destination folder.

Success Looks Like: The processed file is downloaded and ready for upload to the classified folder.

Step 6: Upload File to Correct Folder

Node: Google Drive → Upload file

Settings:

Operation: Upload
Input Data Field Name: data
Parent Drive from List: My Drive
Parent Folder by ID: ={{ $json.id }} (from the search result in Step 4)

Success Looks Like: The processed, searchable file appears in the correctly classified destination folder.

Congrats! You’ve built an intelligent system that OCRs, classifies, and auto-organizes scanned PDFs into searchable, properly filed documents.

Your team can now simply scan and drop documents into one folder, and watch them get automatically processed and organized without any manual intervention.

Built something cool? Share it with us @pdfdotco