How to PDF text search API in Python using PDF.co Web API
Learn to PDF text search API in Python
Sample source code below will display you how to manage a complex task like PDF text search API in Python. PDF.co Web API is the flexible Web API that includes full set of functions from e-signature requests to data extraction, OCR, images recognition, pdf splitting and pdf splitting. Can also generate barcodes and read barcodes from images, scans and pdf. It can be applied to PDF text search API using Python.
The SDK samples given below describe how to quickly make your application do PDF text search API in Python with the help of PDF.co Web API. Follow the instructions from scratch to work and copy the Python code. Enjoy writing a code with ready-to-use source source code samples in Python.
All these programming tutorials along with source code samples and ByteScout free trial version are available for download from our website.
On-demand (REST Web API) version:
Web API (on-demand version)
On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)
PdfTextSearchFromUploadedFile.py
import os import requests # pip install requests # The authentication key (API Key). # Get your own by registering at https://app.pdf.co/documentation/api API_KEY = "******************************************" # Base URL for PDF.co Web API requests BASE_URL = "https://api.pdf.co/v1" # Source PDF file SourceFile = ".\\sample.pdf" # Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. Pages = "" # PDF document password. Leave empty for unprotected documents. Password = "" # Search string. SearchString = "\d{1,}\.\d\d" # Regular expression to find numbers like '100.00' # Note: do not use `+` char in regex, but use `{1,}` instead. # `+` char is valid for URL and will not be escaped, and it will become a space char on the server side. # Enable regular expressions (Regex) RegexSearch = True def main(args = None): uploadedFileUrl = uploadFile(SourceFile) if (uploadedFileUrl != None): searchTextInPDF(uploadedFileUrl) def searchTextInPDF(uploadedFileUrl): """Search Text using PDF.co Web API""" # Prepare requests params as JSON # See documentation: https://apidocs.pdf.co parameters = {} parameters["password"] = Password parameters["pages"] = Pages parameters["url"] = uploadedFileUrl parameters["searchString"] = SearchString parameters["regexSearch"] = RegexSearch # Prepare URL for 'PDF Text Search' API request url = "{}/pdf/find".format(BASE_URL) # Execute request and get response as JSON response = requests.post(url, data=parameters, headers={ "x-api-key": API_KEY }) if (response.status_code == 200): json = response.json() if json["error"] == False: # Display found information for item in json["body"]: print(f"Found text {item['text']} at coordinates {item['left']}, {item['top']}") else: # Show service reported error print(json["message"]) else: print(f"Request error: {response.status_code} {response.reason}") def uploadFile(fileName): """Uploads file to the cloud""" # 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE. # Prepare URL for 'Get Presigned URL' API request url = "{}/file/upload/get-presigned-url?contenttype=application/octet-stream&name={}".format( BASE_URL, os.path.basename(fileName)) # Execute request and get response as JSON response = requests.get(url, headers={ "x-api-key": API_KEY }) if (response.status_code == 200): json = response.json() if json["error"] == False: # URL to use for file upload uploadUrl = json["presignedUrl"] # URL for future reference uploadedFileUrl = json["url"] # 2. UPLOAD FILE TO CLOUD. with open(fileName, 'rb') as file: requests.put(uploadUrl, data=file, headers={ "x-api-key": API_KEY, "content-type": "application/octet-stream" }) return uploadedFileUrl else: # Show service reported error print(json["message"]) else: print(f"Request error: {response.status_code} {response.reason}") return None if __name__ == '__main__': main()
VIDEO
ON-PREMISE OFFLINE SDK
See also:
ON-DEMAND REST WEB API
Get Your API Key
See also:
PDF-co-Web-API-Python-PDF-Text-Search-from-Uploaded-File.pdf