Note: Some familiarity with regular expression (regex) is recommended to successfully redact documents.
Automatically Redact Sensitive Data from PDFs and Save Them Securely
What You'll Have When Done: An automated system that monitors Google Drive for confidential documents, redacts all sensitive information (SSN, credit cards, addresses, etc.), and saves the cleaned PDFs to a secure folder.
Prerequisites
Before you begin, make sure you have:
- A PDF.co API Key
- A Google Drive account with OAuth2 credentials in n8n
- Two Google Drive folders:
- Source folder for confidential documents (Create one here)
- Destination folder for redacted documents
- An n8n instance (cloud or self-hosted)
- Sample employee records with sensitive data (Sample PDF)
Quick Start Options
Option A: I Want It Working Now
- Import this workflow template → Download JSON File
- Connect your Google Drive and PDF.co accounts
- Set your source and destination folders
- Test with a sample confidential document
- Activate and run
Option B: I Want to Build It Step-by-Step
Follow the 6-step guide below to create the automation from scratch:
What This Automation Does (Overview):
- Monitors a Google Drive folder for new confidential PDFs
- Downloads each new document automatically
- Uploads to PDF.co for processing
- Redacts all sensitive data using regex patterns
- Downloads the redacted PDF
- Saves the clean document to a secure folder
Step 1: Monitor for New Confidential Documents
Node: Google Drive Trigger → File Created
Settings:
- Trigger On:
Changes Involving a Specific Folder
- Folder to Watch:
YOUR_CONFIDENTIAL_FOLDER
- Watch For:
File Created
Success Looks Like: The trigger fires whenever a new PDF is added to your confidential folder. The file metadata is passed to the next node.
Step 2: Download the Document
Node: Google Drive → Download
Settings:
- Operation:
Download
- File by ID:
{{ $json.id }}
Success Looks Like: The PDF file is downloaded as binary data, ready for processing.
Step 3: Upload to PDF.co
Node: PDF.co → Upload File
Settings:
- Binary Data:
True
- File Name:
file.pdf
Success Looks Like: The file is uploaded to PDF.co's temporary storage. You receive a URL for the uploaded file.
Step 4: Redact Sensitive Information
Node: PDF.co → Search & Replace Text or Delete
Settings:
- Operation Type: Search and Delete Text
- PDF URL:
{{ $json.url }}
- Add Search Patterns:
(?<=Name:\s)[A-Za-z\s]+(?=\s*Email)
(?<=Email:\s)[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
(?<=Phone:\s)\([0-9]{3}\)\s[0-9]{3}-[0-9]{4}
(?<=SSN:\s)[0-9]{3}-[0-9]{2}-[0-9]{4}
(?<=Date of Birth:\s)[A-Za-z]+\s+\d{1,2},\s+\d{4}
(?<=Address:\s)[0-9]+\s[A-Za-z0-9\s,]+(?=\s*Credit Card)
(?<=Credit Card:\s)[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}
(?<=Employee ID:\s)EMP-[0-9]{4}-[0-9]{3}
(?<=Salary:\s)\$[0-9,]+
- Advanced Options:
- Replacement Limit:
0
- Use Regular Expressions:
True
- Custom Profiles:
{"UsePatch": true, "PatchColor": "#000000"}
- Replacement Limit:
Success Looks Like: All sensitive data is replaced with black redaction boxes. The output URL points to the redacted PDF.
Note: These regex patterns are specifically designed for our sample employee record PDF. You’ll need to adjust them based on your document format and the fields you want to redact.
You can learn more about the PDF Search and Delete Text API at the API Docs.
Step 5: Download Redacted PDF
Node: HTTP Request
Settings:
- Method:
GET
- URL:
{{ $json.url }}
Success Looks Like: The redacted PDF is downloaded as binary data, ready to be saved.
Step 6: Save to Secure Folder
Node: Google Drive → Upload
Settings:
- Input Data Field Name:
data
- Parent Folder:
YOUR_REDACTED_FILE_FOLDER
Success Looks Like: The redacted PDF appears in your secure folder with all sensitive information removed.
Note: Consider setting up folder permissions to restrict access to redacted documents.
Congrats! You've automated a compliant document redaction system that protects sensitive data automatically.
Built something cool? Share it with us @pdfdotco
Related Tutorials



