PDF to CSV API in PowerShell using PDF.co Web API

PDF to CSV API in PowerShell

Today we will explain the steps and algorithm of implementing PDF to CSV API and how to make it work in your application. PDF to CSV API in PowerShell can be implemented with PDF.co Web API. PDF.co Web API is the Rest API that provides set of data extraction functions, tools for documents manipulation, splitting and merging of pdf files. Includes built-in OCR, images recognition, can generate and read barcodes from images, scans and pdf.

This rich sample source code in PowerShell for PDF.co Web API includes the number of functions and options you should do calling the API to implement PDF to CSV API. In order to implement this functionality, you should copy and paste code below into your app using code editor. Then compile and run your application. Test PowerShell sample code examples whether they respond your needs and requirements for the project.

Trial version can be downloaded from our website. Source code samples for PowerShell and documentation are included.

On-demand (REST Web API) version:
 Web API (on-demand version)

On-premise offline SDK for Windows:
 60 Day Free Trial (on-premise)

ConvertPdfToCsvFromUrlAsynchronously.ps1

      
# Cloud API asynchronous "PDF To CSV" job example. # Allows to avoid timeout errors when processing huge or scanned PDF documents. # (!) If you are getting '(403) Forbidden' error please ensure you have set the correct API_KEY # The authentication key (API Key). # Get your own by registering at https://app.pdf.co/documentation/api $API_KEY = "***********************************" # Direct URL of source PDF file. $SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-csv/sample.pdf" # Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. $Pages = "" # PDF document password. Leave empty for unprotected documents. $Password = "" # Destination CSV file name $DestinationFile = ".\result.csv" # (!) Make asynchronous job $Async = $true # Some of advanced options available through profiles: # (JSON can be single/double-quoted and contain comments.) # { # "profiles": [ # { # "profile1": { # "CSVSeparatorSymbol": ",", // Separator symbol. # "CSVQuotaionSymbol": "\"", // Quotation symbol. # "ExtractInvisibleText": true, // Invisible text extraction. Values: true / false # "ExtractShadowLikeText": true, // Shadow-like text extraction. Values: true / false # "LineGroupingMode": "None", // Values: "None", "GroupByRows", "GroupByColumns", "JoinOrphanedRows" # "ColumnDetectionMode": "ContentGroupsAndBorders", // Values: "ContentGroupsAndBorders", "ContentGroups", "Borders", "BorderedTables" # "Unwrap": false, // Unwrap grouped text in table cells. Values: true / false # "ShrinkMultipleSpaces": false, // Shrink multiple spaces in table cells that affect column detection. Values: true / false # "DetectNewColumnBySpacesRatio": 1, // Spacing ratio that affects column detection. # "CustomExtractionColumns": [ 0, 50, 150, 200, 250, 300 ], // Explicitly specify columns coordinates for table extraction. # "CheckPermissions": true, // Ignore document permissions. Values: true / false # } # } # ] # } # Sample profile that sets advanced conversion options # Advanced options are properties of CSVExtractor class from ByteScout PDF Extractor SDK used in the back-end: # https://cdn.bytescout.com/help/BytescoutPDFExtractorSDK/html/87ce5fa6-3143-167d-abbd-bc7b5e160fe5.htm $Profiles = '{ "profiles": [{ "profile1": { "OCRMode": "TextFromImagesAndVectorsAndFonts", "CSVSeparatorSymbol": "," } } ] }' # Prepare URL for `PDF To CSV` API call $query = "https://api.pdf.co/v1/pdf/convert/to/csv?name={0}&password={1}&pages={2}&url={3}&async={4}&profiles={5}" -f ` $(Split-Path $DestinationFile -Leaf), $Password, $Pages, $SourceFileUrl, $Async, $Profiles $query = [System.Uri]::EscapeUriString($query) try { # Execute request $jsonResponse = Invoke-RestMethod -Method Get -Headers @{ "x-api-key" = $API_KEY } -Uri $query if ($jsonResponse.error -eq $false) { # Asynchronous job ID $jobId = $jsonResponse.jobId # URL of generated CSV file that will available after the job completion $resultFileUrl = $jsonResponse.url # Check the job status in a loop. do { $statusCheckUrl = "https://api.pdf.co/v1/job/check?jobid=" + $jobId $jsonStatus = Invoke-RestMethod -Method Get -Headers @{ "x-api-key" = $API_KEY } -Uri $statusCheckUrl # Display timestamp and status (for demo purposes) Write-Host "$(Get-date): $($jsonStatus.status)" if ($jsonStatus.status -eq "success") { # Download CSV file Invoke-WebRequest -Headers @{ "x-api-key" = $API_KEY } -OutFile $DestinationFile -Uri $resultFileUrl Write-Host "Generated CSV file saved as `"$($DestinationFile)`" file." break } elseif ($jsonStatus.status -eq "working") { # Pause for a few seconds Start-Sleep -Seconds 3 } else { Write-Host $jsonStatus.status break } } while ($true) } else { # Display service reported error Write-Host $jsonResponse.message } } catch { # Display request error Write-Host $_.Exception }

run.bat

      
@echo off powershell -NoProfile -ExecutionPolicy Bypass -Command "& .\ConvertPdfToCsvFromUrlAsynchronously.ps1" echo Script finished with errorlevel=%errorlevel% pause

VIDEO

ON-PREMISE OFFLINE SDK

Get 60 Day Free Trial

See also:

ON-DEMAND REST WEB API

Get Your API Key

See also:

Related Samples: