Tip: If you see only JSON text, you didn’t set Response Format = File. Flip that and re-run.
Auto-Process Editor’s Articles from Google Drive to TXT + HTML + Compressed PDF (with timestamped filenames)
What you’ll have when done
A hands-free pipeline that:
- Watches a Drive folder for new editor PDFs.
- Normalizes the created time (human-readable
YYYY-MM-DD). - Converts the PDF into
- Text (TXT) for search/indexing,
- HTML for web use,
- Compressed PDF for archiving.
- Downloads each converted file as binary.
- Uploads each to its own Drive folder, with standardized filenames that include the date (for audit and record-keeping).
Prerequisites
- PDF.co API key (set in your “PDF.co account” credentials). Get yours here : https://app.pdf.co/
- Google Drive OAuth2 connected in n8n.
- Three destination Drive folders (copy their IDs):
Output_Web_HTMLfolder ID:1O9zJ2swIu4Ya8bMbFSJlMzr1D_cefFqpOutput_Search_TXTfolder ID:1_HHoFS4MS-wBXGcr3bXwIjx8vC6t-G7YOutput_Archive_PDFsfolder ID:1Pg6I9aSacilnZXpMsGT6FdJh0hx5atki
- One source Drive folder to watch for new PDFs (your trigger folder).
- A sample PDF Article you can test with. Link Here
Quick Start Options
Option A: I Want It Working Now
- Import this workflow template → Download JSON File
- Connect your Google Drive account in n8n
- Add your PDF.co API key in the PDF.co node
- Update the Google Drive Trigger with your source folder
- Update the Google Drive Upload nodes with your destination folders
- Test with a sample PDF Article
- Activate and let it run
Option B : Step-by-Step Build Guide
IN THIS TUTORIAL
Step 1: Trigger: Watch a Drive folder for new PDFs
Node: Google Drive Trigger
What it does: Fires whenever a new file is created in your chosen folder and passes the file’s metadata downstream (including webContentLink, createdTime, etc.).
Settings
- Trigger On: Changes involving a specific folder
- Folder From List: (pick your source folder)
- Watch For: File Created
- (Optional) Options → File Type: PDF
Success looks like: When you add a new PDF, the workflow starts and you can see the file metadata in the node’s output.
Step 2: Normalize the created date for filenames
Node: Code (right after the trigger)
What it does: Takes Google’s createdTime (YYYY-MM-DDTHH:mm:ss.sssZ) and produces a clean date YYYY-MM-DD you can safely put in filenames.
JS to paste (you’re already using this):
/**
* Adds `cleanDate` = YYYY-MM-DD derived from Created At or createdTime
*/
return items.map(item => {
const rawDate = item.json['Created At'] || item.json.createdTime || new Date().toISOString();
const cleanDate = rawDate.split('T')[0];
return { json: { ...item.json, cleanDate } };
});Success looks like: The node output includes "cleanDate": "2025-11-01" (for example).
Step 3: Convert the PDF into three formats (run in parallel)
This is where we fork: connect the output of Code to each of the three PDF.co nodes directly (do not chain them).
Step 1: Convert to Text
Node: Convert to Text (your custom PDF.co node)
What it does: Converts the PDF to a .txt file (great for search, indexing, and quick content grabs).
Settings
- Operation: Convert from PDF
- URL:
={{ $json.webContentLink }} - Advanced Options → File Name:
=FeaturePlaybook_{{ $json.cleanDate }}_Text.txt - Output: JSON with a
urlto the generated TXT file.
Step 2: Convert to HTML
Node: Convert to HTML
What it does: Converts the PDF to HTML (handy for web/CMS ingestion or review).
Settings
- Operation: Convert from PDF
- Convert Type:
toHtml - URL:
={{ $('Code').item.json.webContentLink }} - Advanced Options → File Name:
=FeaturePlaybook_{{ $('Code').item.json.cleanDate }}_Web.html - Output: JSON with a url to the generated HTML file.
Step 3: Compress the PDF
Node: Optimize PDF
What it does: Produces a smaller, web-optimized PDF to store/ship.
Settings
- Operation: Compress PDF
- URL:
={{ $('Google Drive Trigger').item.json.webContentLink }} - Advanced Options → File Name:
=FeaturePlaybook_{{ $('Code').item.json.cleanDate }}_Optimized.pdf
Output: JSON with a url to the compressed PDF.
Step 4: Download each converted file as binary
Google Drive Upload requires binary input. Your PDF.co nodes return a URL — so we add an HTTP Request node after each conversion to download the file as binary.
For each branch (TXT, HTML, PDF), add an HTTP Request node:
Node (example): Binary Text File / Binary HTML File / Binary Compressed file
What it does: Fetches the file and stores it as binary.data.
Settings (v4.2 of HTTP Request)
- Method:
GET - URL:
={{ $json.url }} - Options → Add Option → Response Format: File
- Options → Add Option → Binary Property: data
Success looks like: The node execution shows a Binary tab with a data entry (not a giant HTML or PDF string in JSON).
Step 5: Upload each binary to its destination Drive folder
You can upload separately (one Drive node per branch) or collect and route. The simplest for non-technical users is one upload per branch.
Step 1: Upload Compressed PDF to Archive
Node: Google Drive
What it does: Uploads the optimized PDF.
Settings
- Resource: File
- Operation: Upload
- Input Data Field Name (Binary Property):
data - Parent Drive: My Drive
- Parent Folder by ID:
1Pg6I9aSacilnZXpMsGT6FdJh0hx5atki
Step 2: Upload HTML to Web folder
Node: Google Drive1
Settings
- Resource: File
- Operation: Upload
- Input Data Field Name (Binary Property):
data - Parent Folder by ID:
1O9zJ2swIu4Ya8bMbFSJlMzr1D_cefFqp
Step 3: Upload TXT to Search folder
Node: Google Drive2
Settings
- Resource: File
- Operation: Upload
- Input Data Field Name (Binary Property):
data - Parent Folder by ID:
1_HHoFS4MS-wBXGcr3bXwIjx8vC6t-G7Y
Success looks like: You will see three new files appear in their corresponding folders, named:
FeaturePlaybook_YYYY-MM-DD_Web.htmlFeaturePlaybook_YYYY-MM-DD_Text.txtFeaturePlaybook_YYYY-MM-DD_Optimized.pdf
Why each step matters
- Trigger: listens so nobody has to click “run”.
- Code (date): converts technical timestamps into a simple date you can read and reuse in filenames.
- Three conversions: each format has a purpose
- TXT → search, quick copy/paste, NLP ingestion.
- HTML → web preview/CMS import.
- Compressed PDF → smaller to archive/share.
- HTTP Request (File): turns each URL into a real file (binary) so Drive can upload it.
- Separate uploads: sends each file to the right home, keeping your Drive tidy and consistent.
The exact JS snippets you’ll reuse
A) Normalize the date (already in your workflow)
return items.map(item => {
const rawDate = item.json['Created At'] || item.json.createdTime || new Date().toISOString();
const cleanDate = rawDate.split('T')[0];
return { json: { ...item.json, cleanDate } };
});
Troubleshooting (read this if something fails)
- Drive Upload says: “expects a binary file ‘data’” → Your HTTP Request download is not set to Response Format: File + Binary Property: data. Fix and re-run.
- Only one output shows up → Your three conversion nodes are chained in a line. Make sure the Code node forks to all three conversions directly.
- HTML or TXT shows as big string in JSON → You didn’t download as File. Edit the HTTP Request node options.
- Wrong folder → In Drive Upload, tick Use ID and paste the folder ID, not the folder name.
- Filename missing extension → Ensure your PDF.co “name” fields include
.html,.txt,.pdf.
Your sample workflow JSON (annotated)
You’re already very close. The big wins you implemented:
- Forked from
Code→Optimize PDF, Convert to HTML, Convert to Text.
After each conversion, added an HTTP Request node with:
"options": {
"response": {
"response": {
"responseFormat": "file"
}
}
}
- That’s the correct way to make binary files in v4.2.
Just ensure each Drive node’s Input Data Field Name (aka Binary Property) is data.
Done!
You now have a reliable editorial pipeline:
- Editors drop a PDF → your workflow names it consistently by created date,
- converts it to TXT, HTML, and compressed PDF,
- and files each format to its proper Drive folder with timestamped filenames for clean history and audit.
Built something cool with this workflow? Share it with us @pdfdotco
Related Tutorials



