Why use PDF to XML API?

Extract PDF to XML
With our PDF to XML API, you can convert PDF to XML with information about text value, tables, fonts, images, object positions.

Our API engine resides and runs from the secure Amazon AWS infrastructure. Furthermore, all data transfers from your devices to our servers are secure by SSL/TLS. Check this link for more details about our security: https://pdf.co/security.

No licenses required

Here in PDF.co, we use credits as a form of payment for your conversions. You can either choose a subscription plan or you can top up credits with no expiration.

SIGN UP FOR FREE

Support for damaged and scanned text

PDF.co engine provides automated support for damaged text and images from text recognition. Built-in OCR (Optical Character Recognition) supports PDF files with mixed content and multiple languages.

Web API Supports Multiple Languages

Javascript, PHP, Java, .NET and ASP.NET, C#, and Visual Basic are the main programming languages that can be used on the PDF.co API platform. You can check this Github page for detailed code samples and syntax.

Access to Web API Integrations

For non-programmers, there’s a wide array of PDF activities’ automation via popular RPA platforms such as ZapierIntegromat, UiPathBluePrismAutomation Anywhere. In a few moments, you can connect with the most used applications through our 300+ API integrations.

On-Premise Server for Enterprise

PDF.co platform runs on secure and certified cloud infrastructure. Therefore, enterprise customers with the requirements to process sensitive data in-house can choose our on-premise version. Easily installed on your server, it will be working offline when required.

 

PDF to XML API Sample & Demo

Take a look at the Sample PDF File for this demo.

Screenshot of Sample PDF
Screenshot of Sample PDF

The code snippets below are in different programming languages. They can convert the Sample PDF File above into XML.

The final result will look like this.

Screenshot of Output XML
Screenshot of output XML

Before we proceed with the code. Let us first check the /v1/pdf/convert/to/xml parameters and its uses.

Endpoint

URL: https://api.pdf.co/v1/pdf/convert/to/xml
Method: POST
Parameter Description
url required. Link to the source file.
lang optional. english by default. Sets OCR (image to text extraction) language to be used for scanned PDF when a scanned document is detected or input is PNG, JPG images. Other supported values: eng, spa, deu, fra, jpn, chi_sim, chi_tra, kor. You can also specify two languages to be used on the same page, for example: eng+deu, jpn+kor or other combinations.
inline optional. Must be one of: true to return data as inline or false to return link to an output file (default).
unwrap optional. Unwrap lines to a single line within table cells when lineGrouping is enabled. Must be one of true or false.
pages optional. Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts with (zero). To set a range use the dash , for example: 0, 2-5, 7-.
rect optional. Defines coordinates for extraction, e.g. 51.8, 114.8, 235.5, 204.0. Must be a string.
encrypt optional. Enable encryption for the output file: true or false
async optional. Runs processing asynchronously. Returns jobId to use with job/checktrue or false
name optional. Output file name.
profiles optional. Must be a String. Set custom configuration. See profiles examples here
lineGrouping optional. Line grouping with table cells. Set to 1 to enable the grouping. Must be a string.

Now we are ready to write some codes.

cURL Code Snippet

curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/to/xml' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "https://pdf-temp-files.s3.amazonaws.com/49e77ae7473e47d1a32eac28ffd0c161/sample.pdf"
}'

This sample code and other cURL sample codes are available here.

 

Now let’s see this program in action.

Output XML using cURL
Output XML using cURL

The PDF to XML sample code in JavaScript is available here.

The PDF to XML sample code in PHP is available here.

The sample code for PDF to XML in Python is here.

The PDF to XML sample code in Java is available here.

The PDF to XML sample code in C# is available here.

Sign Up

NOTE: Use PDF.co Document Classifier to know the source of the document. You can easily create and maintain classification rules with the desktop-based Classifier Testing Tool (see the details here)

Related Pages:

Related Samples: