How to parse from url for document parser API in Python using PDF.co Web API
How to parse from url in Python with easy ByteScout code samples to make document parser API. Step-by-step tutorial
If you want a quick and easy way to add a required functionality into your application then check this sample source code documentation. Document parser API in Python can be applied with PDF.co Web API. PDF.co Web API is the Web API with a set of tools for documents manipulation, data conversion, data extraction, splitting and merging of documents. Includes image recognition, built-in OCR, barcode generation and barcode decoders to decode bar codes from scans, pictures and pdf.
Use the code displayed below in your application to save a lot of time on writing and testing code. For implementation of this functionality, please copy and paste the code below into your app using code editor. Then compile and run your app. Enjoy writing a code with ready-to-use sample Python codes to add document parser API functions using PDF.co Web API in Python.
Free! Free! Free! ByteScout free trial version is available for FREE download from our website. Programming tutorials along with source code samples are assembled.
On-demand (REST Web API) version:
Web API (on-demand version)
On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)
MultiPageTable-template1.yml
templateName: Multipage Table Test templateVersion: 4 templatePriority: 0 detectionRules: keywords: - Sample document with multi-page table objects: - name: total objectType: field fieldProperties: fieldType: macros expression: TOTAL{{Spaces}}({{Number}}) regex: true dataType: decimal - name: table1 objectType: table tableProperties: start: expression: Item{{Spaces}}Description{{Spaces}}Price regex: true end: expression: TOTAL{{Spaces}}{{Number}} regex: true row: expression: '{{LineStart}}{{Spaces}}(?<itemNo>{{Digits}}){{Spaces}}(?<description>{{SentenceWithSingleSpaces}}){{Spaces}}(?<price>{{Number}}){{Spaces}}(?<qty>{{Digits}}){{Spaces}}(?<extPrice>{{Number}})' regex: true columns: - name: itemNo dataType: integer - name: description dataType: string - name: price dataType: decimal - name: qty dataType: integer - name: extPrice dataType: decimal multipage: true
ParseFromUrl.py
import os import requests # pip install requests # The authentication key (API Key). # Get your own by registering at https://app.pdf.co/documentation/api API_KEY = "*************************************" # Base URL for PDF.co Web API requests BASE_URL = "https://api.pdf.co/v1" # Source PDF file url SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/document-parser/MultiPageTable.pdf" # Destination JSON file name DestinationFile = ".\\result.json" # Template text. Use Document Parser SDK (https://bytescout.com/products/developer/documentparsersdk/index.html) # to create templates. # Read template from file: file_read = open(".\\MultiPageTable-template1.yml", mode='r', encoding="utf-8",errors="ignore") Template = file_read.read() file_read.close() def main(args = None): PerformDocumentParser(SourceFileUrl, Template, DestinationFile) def PerformDocumentParser(uploadedFileUrl, template, destinationFile): # Content data = { 'url': uploadedFileUrl, 'template': template } # Prepare URL for 'Document Parser' API request url = "{}/pdf/documentparser".format(BASE_URL) # Execute request and get response as JSON response = requests.post(url, data= data, headers={ "x-api-key": API_KEY }) if (response.status_code == 200): json = response.json() if json["error"] == False: # Get URL of result file resultFileUrl = json["url"] # Download result file r = requests.get(resultFileUrl, stream=True) if (r.status_code == 200): with open(destinationFile, 'wb') as file: for chunk in r: file.write(chunk) print(f"Result file saved as \"{destinationFile}\" file.") else: print(f"Request error: {response.status_code} {response.reason}") else: # Show service reported error print(json["message"]) else: print(f"Request error: {response.status_code} {response.reason}") if __name__ == '__main__': main()
VIDEO
ON-PREMISE OFFLINE SDK
See also:
ON-DEMAND REST WEB API
Get Your API Key
See also:
PDF-co-Web-API-Python-Parse-From-Url.pdf