Note: Make sure the scanned PDF is shared with “Anyone with the link” so PDF.co can access it for processing.
Automate OCR and Document Type Detection for Scanned PDFs
What You'll Have When Done: A setup that can automatically run OCR in scanned PDFs, identify the type of document (like an invoice or purchase order), and sort the files into the right folders. Useful for organizing common business documents without manual effort.
Prerequisites
Before you begin, make sure you have:
- A PDF.co API Key (Get yours here)
- Google Drive OAuth2 credentials configured in n8n
- An n8n instance (cloud or self-hosted)
- A designated "incoming" folder in Google Drive for new scanned files
- Pre-created destination folders matching your classification categories (Invoice, Purchase Order, Bill, etc.)
Quick Start Options
Option A: I Want It Working Now
- Import this workflow template → Download JSON File
- Connect your Google Drive account in n8n
- Add your PDF.co API key
- Set up your folder structure and classification rules
- Configure the watched folder
- Test with sample scanned files
- Activate and let it run
Option B: I Want to Build It Step-by-Step
Follow the 5-step guide below to create the automation from scratch.
What This Automation Does (Overview)
- Monitors a specific Google Drive folder for new scanned PDF files
- Performs OCR on scanned documents to make them searchable
- Classifies each document using PDF.co's Document Classifier
- Searches for the appropriate destination folder based on classification
- Uploads the processed, searchable file to the correct organized folder
Step 1: Monitor for New Scanned Files
Node: Google Drive Trigger
Settings:
- Trigger On:
Changes Involving a Specific Folder
- Folder from List: Select your "incoming" or "scanned documents" folder
- Watch For:
File Created
Success Looks Like: The trigger activates immediately when a new scanned file is added to your watched folder.
Step 2: Perform OCR on Scanned Document
Node: PDF.co API → Make PDF Searchable or Unsearchable
Settings:
- URL:
={{ $json.webContentLink }}
- Operation:
Make PDF Searchable
- Language:
English
(Adjust for your documents) - Advanced Options:
- File Name:
={{ $json.originalFilename }}
- File Name:
What This Does: Applies OCR to scanned, image-based PDFs and adds an invisible text layer, making the document fully searchable and selectable without altering its visual appearance.
Success Looks Like: PDF.co processes the document and returns a URL to the searchable version. The response includes the processed file URL that can be used for classification.
Note: To learn more about the PDF Make Searchable API, visit our API Docs.
Step 3: Classify the OCR-Processed Document
Node: HTTP Request → PDF.co Document Classifier
Settings:
- Method:
POST
- URL:
https://api.pdf.co/v1/pdf/classifier
- Headers:
x-api-key
: Your PDF.co API keyContent-Type
: application/json
- Body (JSON):
{
"url": "{{ $json.url }}",
"rulescsv": "Invoice, OR, Invoice Number, Invoice #, INVOICE NO\nPurchase Order, OR, PO Number, Order Number, Order No\nBill, OR, Bill Date, Billing Period, Bill Number"
}
Customize Your Classification Rules
The rulescsv
parameter defines how documents are classified. Format: FolderName, OR, keyword1, keyword2, keyword3
Examples:
Invoice, OR, Invoice Number, Invoice #, INVOICE NO
Purchase Order, OR, PO Number, Order Number, Purchase Order No
Receipt, OR, Receipt, Transaction, Purchase Date
Contract, OR, Agreement, Contract, Terms and Conditions
Statement, OR, Statement, Account Statement, Monthly Statement
Success Looks Like: PDF.co analyzes the OCR'd document content and returns a classification result with the matched category name. The response includes a classes array with the best match.
Note: To learn more about the Document Classifier API, visit our API Docs.
Step 4: Find the Destination Folder
Node: Google Drive → Search files and folders
Settings:
- Resource:
File/Folder
- Operation: Search
- Search Query:
={{ $json.body.classes[0].class }}
- Filter:
- Folder From List: Your main organizational folder (parent of all category folders)
- What to Search: Folders
- Return All: Yes
Success Looks Like: The search returns the folder that matches the classification result. If "Invoice" was classified, it finds your "Invoice" folder.
Tip: Ensure your folder names exactly match the classification categories you defined in Step 3.
Step 5: Download the Processed File
Node: HTTP Request (Download OCR'd File)
Settings:
- Method:
GET
- URL:
={{ $('PDFco Api').item.json.url }}
What This Does: Downloads the OCR-processed, searchable PDF file so it can be uploaded to the destination folder.
Success Looks Like: The processed file is downloaded and ready for upload to the classified folder.
Step 6: Upload File to Correct Folder
Node: Google Drive → Upload file
Settings:
- Operation: Upload
- Input Data Field Name:
data
- Parent Drive from List:
My Drive
- Parent Folder by ID:
={{ $json.id }}
(from the search result in Step 4)
Success Looks Like: The processed, searchable file appears in the correctly classified destination folder.
Congrats! You’ve built an intelligent system that OCRs, classifies, and auto-organizes scanned PDFs into searchable, properly filed documents.
Your team can now simply scan and drop documents into one folder, and watch them get automatically processed and organized without any manual intervention.
Built something cool? Share it with us @pdfdotco
Related Tutorials

