How to parse from uploaded file for document parser API in Python using PDF.co Web API

How to parse from uploaded file in Python with easy ByteScout code samples to make document parser API. Step-by-step tutorial

This page displays the step-by-step instructions and algorithm of how to parse from uploaded file and how to apply it in your application. Document parser API in Python can be applied with PDF.co Web API. PDF.co Web API is the flexible Web API that includes full set of functions from e-signature requests to data extraction, OCR, images recognition, pdf splitting and pdf splitting. Can also generate barcodes and read barcodes from images, scans and pdf.

Use the code displayed below in your application to save a lot of time on writing and testing code. For implementation of this functionality, please copy and paste the code below into your app using code editor. Then compile and run your app. This basic programming language sample code for Python will do the whole work for you in implementing document parser API in your app.

Our website provides free trial version of PDF.co Web API that gives source code samples to assist with your Python project.

On-demand (REST Web API) version:
 Web API (on-demand version)

On-premise offline SDK for Windows:
 60 Day Free Trial (on-premise)

MultiPageTable-template1.yml

      
--- # Template that demonstrates parsing of multi-page table using only # regular expressions for the table start, end, and rows. # If regular expression cannot be written for every table row (for example, # if the table contains empty cells), try the second method demonstrated # in `MultiPageTable-template2.yml` template. templateVersion: 3 templatePriority: 0 sourceId: Multipage Table Test detectionRules: keywords: - Sample document with multi-page table fields: total: type: regex expression: TOTAL {{DECIMAL}} dataType: decimal tables: - name: table1 start: # regular expression to find the table start in document expression: Item\s+Description\s+Price\s+Qty\s+Extended Price end: # regular expression to find the table end in document expression: TOTAL\s+\d+\.\d\d row: # regular expression to find table rows expression: '^\s*(?<itemNo>\d+)\s+(?<description>.+?)\s+(?<price>\d+\.\d\d)\s+(?<qty>\d+)\s+(?<extPrice>\d+\.\d\d)' columns: - name: itemNo type: integer - name: description type: string - name: price type: decimal - name: qty type: integer - name: extPrice type: decimal multipage: true

ParseFromUploadedFile.py

      
import os import requests # pip install requests # The authentication key (API Key). # Get your own by registering at https://app.pdf.co/documentation/api API_KEY = "*************************************" # Base URL for PDF.co Web API requests BASE_URL = "https://api.pdf.co/v1" # Source PDF file SourceFile = ".\\MultiPageTable.pdf" # Destination JSON file name DestinationFile = ".\\result.json" # Template text. Use Document Parser SDK (https://bytescout.com/products/developer/documentparsersdk/index.html) # to create templates. # Read template from file: file_read = open(".\\MultiPageTable-template1.yml", mode='r', encoding="utf-8",errors="ignore") Template = file_read.read() file_read.close() def main(args = None): uploadedFileUrl = uploadFile(SourceFile) if (uploadedFileUrl != None): PerformDocumentParser(uploadedFileUrl, Template, DestinationFile) def PerformDocumentParser(uploadedFileUrl, template, destinationFile): # Content data = { 'url': uploadedFileUrl, 'template': template } # Prepare URL for 'Document Parser' API request url = "{}/pdf/documentparser".format(BASE_URL) # Execute request and get response as JSON response = requests.post(url, data= data, headers={ "x-api-key": API_KEY }) if (response.status_code == 200): json = response.json() if json["error"] == False: # Get URL of result file resultFileUrl = json["url"] # Download result file r = requests.get(resultFileUrl, stream=True) if (r.status_code == 200): with open(destinationFile, 'wb') as file: for chunk in r: file.write(chunk) print(f"Result file saved as \"{destinationFile}\" file.") else: print(f"Request error: {response.status_code} {response.reason}") else: # Show service reported error print(json["message"]) else: print(f"Request error: {response.status_code} {response.reason}") def uploadFile(fileName): """Uploads file to the cloud""" # 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE. # Prepare URL for 'Get Presigned URL' API request url = "{}/file/upload/get-presigned-url?contenttype=application/octet-stream&name={}".format( BASE_URL, os.path.basename(fileName)) # Execute request and get response as JSON response = requests.get(url, headers={"x-api-key": API_KEY}) if (response.status_code == 200): json = response.json() if json["error"] == False: # URL to use for file upload uploadUrl = json["presignedUrl"] # URL for future reference uploadedFileUrl = json["url"] # 2. UPLOAD FILE TO CLOUD. with open(fileName, 'rb') as file: requests.put(uploadUrl, data=file, headers={"x-api-key": API_KEY, "content-type": "application/octet-stream"}) return uploadedFileUrl else: # Show service reported error print(json["message"]) else: print(f"Request error: {response.status_code} {response.reason}") return None if __name__ == '__main__': main()

VIDEO

ON-PREMISE OFFLINE SDK

Get 60 Day Free Trial

See also:

ON-DEMAND REST WEB API

Get Your API Key

See also:

Related Samples: