Smart PDF Processing: Transform Any Document into Actionable Data

Aug 14, 2025·6 Minutes Read

What You'll Have When Done

A fully automated system that monitors your Google Drive for new PDF uploads, intelligently extracts structured data using AI, and populates a Google Sheets dashboard with actionable insights.

Prerequisites

Before you begin, make sure you have:

  • A PDF.co API Key (Get yours here)
  • OpenAI API credentials configured in n8n (or n8n free OpenAI credits)
  • Google Drive OAuth2 credentials configured in n8n
  • Google Sheets OAuth2 credentials configured in n8n
  • An n8n instance (cloud or self-hosted)
  • A Google Drive folder for monitoring new lab reports
  • A Google Sheet with columns for lab report data (patient info, lab values, critical alerts)

Quick Start Options

Option A: I Want It Working Now

  1. Import this workflow template → Download JSON File
  2. Connect your Google accounts (Drive, Sheets) in n8n
  3. Add your PDF.co API key
  4. Configure your OpenAI API credentials
  5. Set up your Google Drive folder for lab reports
  6. Create/configure your Google Sheet with lab data columns
  7. Test with a sample lab report
  8. Activate and let it run

Option B: I Want to Build It Step-by-Step

Follow the 7-step guide below to create the automation from scratch.

What This Automation Does (Overview)

  • Monitors your Google Drive folder for new PDF lab reports
  • Converts PDF documents to structured JSON data
  • Extracts text and numerical data from complex medical reports
  • Analyzes lab results using AI to identify critical values and clinical significance
  • Structures patient demographics, test results, and critical alerts
  • Populates a comprehensive Google Sheets database for tracking and analysis

Real Example: Complete Blood Count (CBC) lab report → Get structured data in your spreadsheet including patient info, all lab values with flags, critical alerts, physician notifications, and clinical assessments. View Sample Files Here.

Step-by-Step Build Guide

Step 1: Monitor for New Lab Reports

Node: Google Drive Trigger

Settings:

  • Trigger On: Changes Involving a Specific Folder
  • Folder to Watch: Select your designated barcode upload folder
  • Watch For: File Created

What This Does: Automatically detects when medical staff or lab systems upload new PDF reports to your designated Google Drive folder and starts the processing workflow.

Success Looks Like: The trigger activates immediately when a lab report is uploaded, capturing the file information and download link for processing.

Important: Make sure the file sharing setting is set to “Anyone with the link” so that PDF.co can access the file.

Step 2: Convert PDF to Structured Data

Node: PDF.co API → Convert from PDF

Settings:

  • Convert Type: PDF to JSON (Simple)
  • URL: {{ $json.webContentLink }}

What This Does: Transforms the PDF lab report into structured JSON format, preserving the layout and making text extraction possible while maintaining the relationship between data elements.

Success Looks Like: PDF.co returns structured JSON containing all text elements, tables, and formatting information from the medical report.

Step 3: Download Converted Data

Node: HTTP Request

Settings:

  • Method: GET
  • URL: {{ $json.url }} (from the conversion step)

What This Does: Downloads the converted JSON data from PDF.co's servers so it can be processed and analyzed by the next steps.

Success Looks Like: The structured lab report data is downloaded and ready for text extraction and AI analysis.

Step 4: Extract Text from Document Structure

Node: Code (JavaScript)

JavaScript Code: Get The Code Here

What This Does: Processes the complex JSON structure to extract readable text from the medical report, handling nested data structures and preserving important formatting and spacing.

Key Logic:

  • Navigates through document pages and row/column structures
  • Extracts text from each data cell
  • Maintains proper spacing and line breaks
  • Creates a clean text version for AI analysis

Success Looks Like: Clean, readable text extracted from the lab report containing all patient information, test results, and clinical notes.

Step 5: AI Analysis of Lab Results

Node: OpenAI → Message a Model

Settings:

  • Model: GPT-4.1-MINI (or available)
  • Prompt: Custom medical analysis prompt for extracting:
    • Patient demographics (name, DOB, age, sex, patient ID)
    • Test details (specimen type, collection date, ordering physician)
    • All lab values with results, units, reference ranges, and abnormal flags
    • Critical values and physician notifications
    • Clinical assessment of abnormal findings
  • Message: (Sample)
Analyze this document and extract structured information:

{{$node["Code"].json["extractedText"]}}

Please extract and structure the following information in JSON format:
1. Document type and key identifiers
2. Important dates and references
3. Key data points with values and context
4. Critical information requiring attention
5. Overall document summary

Return only valid JSON with a clear structure.

What This Does: Uses advanced AI to understand the medical context, identify critical lab values, extract structured data, and provide clinical insights about abnormal findings.

Success Looks Like: AI returns structured JSON containing all relevant medical data with proper categorization and clinical context.

Step 6: Structure Data for Google Sheets

Node: Code (JavaScript)

JavaScript Code: Get The Code Here

What This Does: Transforms the AI's JSON response into a flat structure suitable for Google Sheets, mapping complex nested data to individual columns.

Success Looks Like: Clean, structured data ready for spreadsheet insertion with all medical information properly categorized and formatted.

Step 7: Save to Medical Database

Node: Google Sheets → Append Row

Settings:

  • Document From List:Select your lab reports tracking spreadsheet
  • Sheet From List: Choose the appropriate sheet (usually Sheet1)
  • Column Mapping: Maps all extracted data to corresponding spreadsheet columns

What This Does: Adds each processed lab report as a new row in your medical database, creating a comprehensive tracking system for patient lab results and critical values.

Success Looks Like: Each lab report appears as a structured row in your spreadsheet organized for easy analysis and reporting.

Congrats! You've created an intelligent medical document processing system that transforms complex PDF lab reports into structured, searchable data.

Important Note: This workflow is designed for educational and workflow automation purposes. Always ensure compliance with healthcare data regulations (HIPAA, etc.) and proper data security measures when processing medical information.

Built something cool with this workflow? Share it with us @pdfdotco

Related Tutorials

See Related Tutorials