How to parse from url for document parser API in Python using PDF.co Web API

How to parse from url in Python with easy ByteScout code samples to make document parser API. Step-by-step tutorial

If you want a quick and easy way to add a required functionality into your application then check this sample source code documentation. Document parser API in Python can be applied with PDF.co Web API. PDF.co Web API is the Web API with a set of tools for documents manipulation, data conversion, data extraction, splitting and merging of documents. Includes image recognition, built-in OCR, barcode generation and barcode decoders to decode bar codes from scans, pictures and pdf.

Use the code displayed below in your application to save a lot of time on writing and testing code. For implementation of this functionality, please copy and paste the code below into your app using code editor. Then compile and run your app. Enjoy writing a code with ready-to-use sample Python codes to add document parser API functions using PDF.co Web API in Python.

Free! Free! Free! ByteScout free trial version is available for FREE download from our website. Programming tutorials along with source code samples are assembled.

On-demand (REST Web API) version:
 Web API (on-demand version)

On-premise offline SDK for Windows:
 60 Day Free Trial (on-premise)

MultiPageTable-template1.yml

      
--- # Template that demonstrates parsing of multi-page table using only # regular expressions for the table start, end, and rows. # If regular expression cannot be written for every table row (for example, # if the table contains empty cells), try the second method demonstrated # in `MultiPageTable-template2.yml` template. templateVersion: 3 templatePriority: 0 sourceId: Multipage Table Test detectionRules: keywords: - Sample document with multi-page table fields: total: type: regex expression: TOTAL {{DECIMAL}} dataType: decimal tables: - name: table1 start: # regular expression to find the table start in document expression: Item\s+Description\s+Price\s+Qty\s+Extended Price end: # regular expression to find the table end in document expression: TOTAL\s+\d+\.\d\d row: # regular expression to find table rows expression: '^\s*(?<itemNo>\d+)\s+(?<description>.+?)\s+(?<price>\d+\.\d\d)\s+(?<qty>\d+)\s+(?<extPrice>\d+\.\d\d)' columns: - name: itemNo type: integer - name: description type: string - name: price type: decimal - name: qty type: integer - name: extPrice type: decimal multipage: true

ParseFromUrl.py

      
import os import requests # pip install requests # The authentication key (API Key). # Get your own by registering at https://app.pdf.co/documentation/api API_KEY = "*************************************" # Base URL for PDF.co Web API requests BASE_URL = "https://api.pdf.co/v1" # Source PDF file url SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/document-parser/MultiPageTable.pdf" # Destination JSON file name DestinationFile = ".\\result.json" # Template text. Use Document Parser SDK (https://bytescout.com/products/developer/documentparsersdk/index.html) # to create templates. # Read template from file: file_read = open(".\\MultiPageTable-template1.yml", mode='r', encoding="utf-8",errors="ignore") Template = file_read.read() file_read.close() def main(args = None): PerformDocumentParser(SourceFileUrl, Template, DestinationFile) def PerformDocumentParser(uploadedFileUrl, template, destinationFile): # Content data = { 'url': uploadedFileUrl, 'template': template } # Prepare URL for 'Document Parser' API request url = "{}/pdf/documentparser".format(BASE_URL) # Execute request and get response as JSON response = requests.post(url, data= data, headers={ "x-api-key": API_KEY }) if (response.status_code == 200): json = response.json() if json["error"] == False: # Get URL of result file resultFileUrl = json["url"] # Download result file r = requests.get(resultFileUrl, stream=True) if (r.status_code == 200): with open(destinationFile, 'wb') as file: for chunk in r: file.write(chunk) print(f"Result file saved as \"{destinationFile}\" file.") else: print(f"Request error: {response.status_code} {response.reason}") else: # Show service reported error print(json["message"]) else: print(f"Request error: {response.status_code} {response.reason}") if __name__ == '__main__': main()

VIDEO

ON-PREMISE OFFLINE SDK

Get 60 Day Free Trial

See also:

ON-DEMAND REST WEB API

Get Your API Key

See also:

Related Samples: