How to PDF make searchable API in Python with PDF.co Web API
How to write a robust code in Python to PDF make searchable API with this step-by-step tutorial
ByteScout simple and easy to understand tutorials are planned to describe the code for both Python beginners and advanced programmers. What is PDF.co Web API? It is the flexible Web API that includes full set of functions from e-signature requests to data extraction, OCR, images recognition, pdf splitting and pdf splitting. Can also generate barcodes and read barcodes from images, scans and pdf. It can help you to PDF make searchable API in your Python application.
The SDK samples given below describe how to quickly make your application do PDF make searchable API in Python with the help of PDF.co Web API. Just copy and paste the code into your Python application’s code and follow the instructions. Applying Python application mostly includes various stages of the software development so even if the functionality works please test it with your data and the production environment.
All these programming tutorials along with source code samples and ByteScout free trial version are available for download from our website.
On-demand (REST Web API) version:
Web API (on-demand version)
On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)
MakeSearchablePdfFromUrlAsynchronously.py
""" Cloud API asynchronous "PDF To Text" job example. Allows to avoid timeout errors when processing huge or scanned PDF documents. """ import os import requests # pip install requests import time import datetime # The authentication key (API Key). # Get your own by registering at https://app.pdf.co/documentation/api API_KEY = "******************************************" # Base URL for PDF.co Web API requests BASE_URL = "https://api.pdf.co/v1" # Direct URL of source PDF file. SourceFileURL = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-make-searchable/sample.pdf" # Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. Pages = "" # PDF document password. Leave empty for unprotected documents. Password = "" # Destination PDF file name DestinationFile = ".\\result.pdf" # OCR language. "eng", "fra", "deu", "spa" supported currently. Let us know if you need more. Language = "eng" # (!) Make asynchronous job Async = True def main(args = None): makeSearchablePDF(SourceFileURL, DestinationFile) def makeSearchablePDF(uploadedFileUrl, destinationFile): """Make Searchable PDF using PDF.co Web API""" # Prepare requests params as JSON # See documentation: https://apidocs.pdf.co parameters = {} parameters["async"] = Async parameters["name"] = os.path.basename(destinationFile) parameters["password"] = Password parameters["pages"] = Pages parameters["lang"] = Language parameters["url"] = uploadedFileUrl # Prepare URL for 'Make Searchable PDF' API request url = "{}/pdf/makesearchable".format(BASE_URL) # Execute request and get response as JSON response = requests.post(url, data=parameters, headers={ "x-api-key": API_KEY }) if (response.status_code == 200): json = response.json() if json["error"] == False: # Asynchronous job ID jobId = json["jobId"] # URL of the result file resultFileUrl = json["url"] # Check the job status in a loop. # If you don't want to pause the main thread you can rework the code # to use a separate thread for the status checking and completion. while True: status = checkJobStatus(jobId) # Possible statuses: "working", "failed", "aborted", "success". # Display timestamp and status (for demo purposes) print(datetime.datetime.now().strftime("%H:%M.%S") + ": " + status) if status == "success": # Download result file r = requests.get(resultFileUrl, stream=True) if (r.status_code == 200): with open(destinationFile, 'wb') as file: for chunk in r: file.write(chunk) print(f"Result file saved as \"{destinationFile}\" file.") else: print(f"Request error: {response.status_code} {response.reason}") break elif status == "working": # Pause for a few seconds time.sleep(3) else: print(status) break else: # Show service reported error print(json["message"]) else: print(f"Request error: {response.status_code} {response.reason}") def checkJobStatus(jobId): """Checks server job status""" url = f"{BASE_URL}/job/check?jobid={jobId}" response = requests.get(url, headers={ "x-api-key": API_KEY }) if (response.status_code == 200): json = response.json() return json["status"] else: print(f"Request error: {response.status_code} {response.reason}") return None if __name__ == '__main__': main()
VIDEO
ON-PREMISE OFFLINE SDK
See also:
ON-DEMAND REST WEB API
Get Your API Key
See also:
PDF-co-Web-API-Python-Make-Searchable-PDF-From-URL-Asynchronously.pdf