How to Convert PDF to CSV from the Uploaded File for PDF to CSV API in Python with PDF.co Web API

Learn in Simple Ways: How to Convert PDF to CSV from the Uploaded File for PDF to CSV API in Python

Writing the code to convert PDF to CSV from the uploaded file in Python can be done by developers of any level using PDF.co Web API. PDF.co Web API helps with PDF to CSV API in Python. PDF.co Web API is the Web API with a set of tools for document manipulation, data conversion, data extraction, splitting, and merging of documents. Includes image recognition, built-in OCR, barcode generation, and barcode decoders to decode bar codes from scans, pictures, and PDFs.

Python code snippet like this for PDF.co Web API works best when you need to quickly implement PDF to CSV API in your Python application. This Python sample code can be used by copying and pasting it into your project. Once done, just compile your project and click Run. Further improvement of the code will make it more powerful.

On-demand (REST Web API) version:
 Web API (on-demand version)

On-premise offline SDK for Windows:
 60 Day Free Trial (on-premise)

 

Let’s dive into source code and then we’ll observe it’s output and do a brief analysis. Source code is as follows.

Source Code

      
import os import requests # pip install requests # The authentication key (API Key). # Get your own by registering at https://app.pdf.co/documentation/api API_KEY = "******************************************" # Base URL for PDF.co Web API requests BASE_URL = "https://api.pdf.co/v1" # Source PDF file SourceFile = ".\\sample.pdf" # Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. Pages = "" # PDF document password. Leave empty for unprotected documents. Password = "" # Destination CSV file name DestinationFile = ".\\result.csv" def main(args = None): uploadedFileUrl = uploadFile(SourceFile) if (uploadedFileUrl != None): convertPdfToCSV(uploadedFileUrl, DestinationFile) def convertPdfToCSV(uploadedFileUrl, destinationFile): """Converts PDF To CSV using PDF.co Web API""" # Prepare requests params as JSON # See documentation: https://apidocs.pdf.co parameters = {} parameters["name"] = os.path.basename(destinationFile) parameters["password"] = Password parameters["pages"] = Pages parameters["url"] = uploadedFileUrl # Prepare URL for 'PDF To CSV' API request url = "{}/pdf/convert/to/csv".format(BASE_URL) # Execute request and get response as JSON response = requests.post(url, data=parameters, headers={ "x-api-key": API_KEY }) if (response.status_code == 200): json = response.json() if json["error"] == False: # Get URL of result file resultFileUrl = json["url"] # Download result file r = requests.get(resultFileUrl, stream=True) if (r.status_code == 200): with open(destinationFile, 'wb') as file: for chunk in r: file.write(chunk) print(f"Result file saved as \"{destinationFile}\" file.") else: print(f"Request error: {response.status_code} {response.reason}") else: # Show service reported error print(json["message"]) else: print(f"Request error: {response.status_code} {response.reason}") def uploadFile(fileName): """Uploads file to the cloud""" # 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE. # Prepare URL for 'Get Presigned URL' API request url = "{}/file/upload/get-presigned-url?contenttype=application/octet-stream&name={}".format( BASE_URL, os.path.basename(fileName)) # Execute request and get response as JSON response = requests.get(url, headers={ "x-api-key": API_KEY }) if (response.status_code == 200): json = response.json() if json["error"] == False: # URL to use for file upload uploadUrl = json["presignedUrl"] # URL for future reference uploadedFileUrl = json["url"] # 2. UPLOAD FILE TO CLOUD. with open(fileName, 'rb') as file: requests.put(uploadUrl, data=file, headers={ "x-api-key": API_KEY, "content-type": "application/octet-stream" }) return uploadedFileUrl else: # Show service reported error print(json["message"]) else: print(f"Request error: {response.status_code} {response.reason}") return None if __name__ == '__main__': main()

 

Output

Python PDF to CSV

 

Analysis

PDF.co API endpoint ‘/pdf/convert/to/csv’ is being used to convert PDF to CSV file format. This API endpoint expects input parameter such as,

  • URL of input file
  • Pages to be converted
  • Password of PDF document, in case PDF is password protected

We can also provide additional parameters to adjust the output in the desired format. An example is when we want a different delimiter than a comma, Let’s say tab or pipe symbol. These enhancements are possible by providing additional request parameters. For more information, please visit the official documentation here.

For all PDF.co requests, we require the API Key to be passed in the header parameter “x-api-key”, which you’ve noticed in this sample.

    # Execute request and get response as JSON
    response = requests.get(url, headers={ "x-api-key": API_KEY })

This PDF.co API key is essential to authentication of PDF requests and can be obtained by signing up to PDF.co.

Now, coding wise we’re doing two things here. First, uploading our file to PDF.co Cloud (uploadFile function) and then using this uploaded URL to CSV format conversion (convertPdfToCSV function).

Please try this sample on your machine with your PDF files for better understanding. Thank you!

 

VIDEO

ON-PREMISE OFFLINE SDK

Get 60 Day Free Trial

See also:

ON-DEMAND REST WEB API

Get Your API Key

See also: