How to convert PDF to XML from uploaded file for PDF to XML API in Python with Web API

Learn to write code convert PDF to XML from uploaded file for PDF to XML API in Python: Simple How To Tutorial

The documentation is written to assist you to apply all the necessary features on your side. Web API was designed to assist PDF to XML API in Python. Web API is the flexible Web API that includes a full set of functions from e-signature requests to data extraction, OCR, images recognition, PDF splitting, and PDF splitting. Can also generate barcodes and read barcodes from images, scans, and PDF.

The SDK samples displayed below explain how to quickly make your application do PDF to XML API in Python with the help of Web API. This Python sample code can be used by copying and pasting it into your project. You can also refer to our GitHub repository for getting source code at this location. Once done, just compile your project and click Run. Writing Python application mostly includes various stages of the software development so even if the functionality works please check it with your data and the production environment. Web API – free trial version is available on our website. Also, there are other code samples to help you with your Python application included in the trial version.

On-demand (REST Web API) version:
 Web API (on-demand version)

On-premise offline SDK for Windows:
 60 Day Free Trial (on-premise)

import os
import requests # pip install requests

# The authentication key (API Key).
# Get your own by registering at
API_KEY = "******************************************"

# Base URL for Web API requests

# Source PDF file
SourceFile = ".\\sample.pdf"
# Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
Pages = ""
# PDF document password. Leave empty for unprotected documents.
Password = ""
# Destination XML file name
DestinationFile = ".\\result.xml"

def main(args = None):
    uploadedFileUrl = uploadFile(SourceFile)
    if (uploadedFileUrl != None):
        convertPdfToXml(uploadedFileUrl, DestinationFile)

def convertPdfToXml(uploadedFileUrl, destinationFile):
    """Converts PDF To XML using Web API"""

    # Prepare requests params as JSON
    # See documentation:
    parameters = {}
    parameters["name"] = os.path.basename(destinationFile)
    parameters["password"] = Password
    parameters["pages"] = Pages
    parameters["url"] = uploadedFileUrl

    # Prepare URL for 'PDF To XML' API request
    url = "{}/pdf/convert/to/xml".format(BASE_URL)

    # Execute request and get response as JSON
    response =, data=parameters, headers={ "x-api-key": API_KEY })
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            #  Get URL of result file
            resultFileUrl = json["url"]            
            # Download result file
            r = requests.get(resultFileUrl, stream=True)
            if (r.status_code == 200):
                with open(destinationFile, 'wb') as file:
                    for chunk in r:
                print(f"Result file saved as \"{destinationFile}\" file.")
                print(f"Request error: {response.status_code} {response.reason}")
            # Show service reported error
        print(f"Request error: {response.status_code} {response.reason}")

def uploadFile(fileName):
    """Uploads file to the cloud"""

    # Prepare URL for 'Get Presigned URL' API request
    url = "{}/file/upload/get-presigned-url?contenttype=application/octet-stream&name={}".format(
        BASE_URL, os.path.basename(fileName))
    # Execute request and get response as JSON
    response = requests.get(url, headers={ "x-api-key": API_KEY })
    if (response.status_code == 200):
        json = response.json()
        if json["error"] == False:
            # URL to use for file upload
            uploadUrl = json["presignedUrl"]
            # URL for future reference
            uploadedFileUrl = json["url"]

            # 2. UPLOAD FILE TO CLOUD.
            with open(fileName, 'rb') as file:
                requests.put(uploadUrl, data=file, headers={ "x-api-key": API_KEY, "content-type": "application/octet-stream" })

            return uploadedFileUrl
            # Show service reported error
        print(f"Request error: {response.status_code} {response.reason}")

    return None

if __name__ == '__main__':



Now that we’ve already reviewed source code along with output, Let’s analyze code a bit.

Initially, we’re gathering all necessary information for endpoint request for PDF to XML conversion. API_KEY variable holds API key and it’s passed in request header for authentication purpose. We’ve also specified parameters for source PDF file (SourceFile), Page numbers (Pages) whose data would be converted to XML, Destination location (DestinationFile) where output XML will be stored, etc.

This program is logically divided into two functions, uploadFile and convertPdfToXml. As the name suggests function uploadFile will upload PDF to cloud and get public URL and convertPdfToXml function is using uploaded PDF file’s public URL and performing XML conversation.

Uploading input PDF file to cloud, is very simple and straight forward process. First we’re requesting for pre-signed URL. When making call for pre-signed URL we’re using endpoint /file/upload/get-presigned-url, and also passing input file name in request input. Output of this API call, consists of returned pre-signed URL and public URL for uploaded file. Then we’re using this pre-signed URL to uplaod actual file with PUT reqeust. Please note here, files uploaded to cloud are temporary and only availabe for few hours. API endpoint /pdf/convert/to/xml is used here to perform PDF to XML conversation. For this endpoint request, we’re preparing JSON request data. In request we’re passing API key x-api-key in request header. Upon completion of API request, we’ll have converted XML in url parameter.

Generated XML data contains PDF data as well as other useful properties such as font information, co-ordinate information of extracted text, etc.

PDF to XML endpoint can be configured as per our requirements. Following are the some of the additional parameters we can provide, please refer to documentation for more information.

rect Defines coordinates for data extraction.
lang Sets OCR language to be used for scanned PDF, PNG, JPG documents when extracting data from them.
async Runs procssing asynchronously. When this parameter is enabled, it returns JobId, and upon completion of that job output can be retrieved.
inline If this parameter is enabled, then response will contain XML data directly instead of URL of that data.

Please try to execute this sample in your development machine to get more out of this article. Thank you for reading!



Get 60 Day Free Trial

See also:


Get Your API Key

See also:


Related Pages:

Related Samples: