Automate OCR and Document Type Detection for Scanned PDFs

Jul 31, 2025·4 Minutes Read

What You'll Have When Done: A setup that can automatically run OCR in scanned PDFs, identify the type of document (like an invoice or purchase order), and sort the files into the right folders. Useful for organizing common business documents without manual effort.

Prerequisites

Before you begin, make sure you have:

  • A PDF.co API Key (Get yours here)
  • Google Drive OAuth2 credentials configured in n8n
  • An n8n instance (cloud or self-hosted)
  • A designated "incoming" folder in Google Drive for new scanned files
  • Pre-created destination folders matching your classification categories (Invoice, Purchase Order, Bill, etc.)

Quick Start Options

Option A: I Want It Working Now

  1. Import this workflow template → Download JSON File
  2. Connect your Google Drive account in n8n
  3. Add your PDF.co API key
  4. Set up your folder structure and classification rules
  5. Configure the watched folder
  6. Test with sample scanned files
  7. Activate and let it run

Option B: I Want to Build It Step-by-Step

Follow the 5-step guide below to create the automation from scratch.

What This Automation Does (Overview)

  1. Monitors a specific Google Drive folder for new scanned PDF files
  2. Performs OCR on scanned documents to make them searchable
  3. Classifies each document using PDF.co's Document Classifier
  4. Searches for the appropriate destination folder based on classification
  5. Uploads the processed, searchable file to the correct organized folder

Step 1: Monitor for New Scanned Files

Node: Google Drive Trigger

Settings:

  • Trigger On: Changes Involving a Specific Folder
  • Folder from List: Select your "incoming" or "scanned documents" folder
  • Watch For: File Created

Success Looks Like: The trigger activates immediately when a new scanned file is added to your watched folder.

Note: Make sure the scanned PDF is shared with “Anyone with the link” so PDF.co can access it for processing.

Step 2: Perform OCR on Scanned Document

Node: PDF.co API → Make PDF Searchable or Unsearchable

Settings:

  • URL: ={{ $json.webContentLink }}
  • Operation: Make PDF Searchable
  • Language: English (Adjust for your documents)
  • Advanced Options:
    • File Name: ={{ $json.originalFilename }}

What This Does: Applies OCR to scanned, image-based PDFs and adds an invisible text layer, making the document fully searchable and selectable without altering its visual appearance.

Success Looks Like: PDF.co processes the document and returns a URL to the searchable version. The response includes the processed file URL that can be used for classification.

Note: To learn more about the PDF Make Searchable API, visit our API Docs.

Step 3: Classify the OCR-Processed Document

Node: HTTP Request → PDF.co Document Classifier

Settings:

  • Method: POST
  • URL: https://api.pdf.co/v1/pdf/classifier
  • Headers:
    • x-api-key: Your PDF.co API key
    • Content-Type: application/json
  • Body (JSON):
{
  "url": "{{ $json.url }}",
  "rulescsv": "Invoice, OR, Invoice Number, Invoice #, INVOICE NO\nPurchase Order, OR, PO Number, Order Number, Order No\nBill, OR, Bill Date, Billing Period, Bill Number"
}

Customize Your Classification Rules

The rulescsv parameter defines how documents are classified. Format: FolderName, OR, keyword1, keyword2, keyword3

Examples:

  • Invoice, OR, Invoice Number, Invoice #, INVOICE NO
  • Purchase Order, OR, PO Number, Order Number, Purchase Order No
  • Receipt, OR, Receipt, Transaction, Purchase Date
  • Contract, OR, Agreement, Contract, Terms and Conditions
  • Statement, OR, Statement, Account Statement, Monthly Statement

Success Looks Like: PDF.co analyzes the OCR'd document content and returns a classification result with the matched category name. The response includes a classes array with the best match.

Note: To learn more about the Document Classifier API, visit our API Docs.

Step 4: Find the Destination Folder

Node: Google Drive → Search files and folders

Settings:

  • Resource: File/Folder
  • Operation: Search
  • Search Query: ={{ $json.body.classes[0].class }}
  • Filter:
    • Folder From List: Your main organizational folder (parent of all category folders)
    • What to Search: Folders
  • Return All: Yes

Success Looks Like: The search returns the folder that matches the classification result. If "Invoice" was classified, it finds your "Invoice" folder.

Tip: Ensure your folder names exactly match the classification categories you defined in Step 3.

Step 5: Download the Processed File

Node: HTTP Request (Download OCR'd File)

Settings:

  • Method: GET
  • URL: ={{ $('PDFco Api').item.json.url }}

What This Does: Downloads the OCR-processed, searchable PDF file so it can be uploaded to the destination folder.

Success Looks Like: The processed file is downloaded and ready for upload to the classified folder.

Step 6: Upload File to Correct Folder

Node: Google Drive → Upload file

Settings:

  • Operation: Upload
  • Input Data Field Name: data
  • Parent Drive from List: My Drive
  • Parent Folder by ID: ={{ $json.id }} (from the search result in Step 4)

Success Looks Like: The processed, searchable file appears in the correctly classified destination folder.

Congrats! You’ve built an intelligent system that OCRs, classifies, and auto-organizes scanned PDFs into searchable, properly filed documents.

Your team can now simply scan and drop documents into one folder, and watch them get automatically processed and organized without any manual intervention.

Built something cool? Share it with us @pdfdotco

Related Tutorials

See Related Tutorials