How to Convert PDF to XML from Uploaded File for PDF to XML API in PHP and PDF.co Web API

What is PDF.co Web API? It is the REST API that provides a set of data extraction functions, tools for documents manipulation, splitting and merging of PDF files. Includes built-in OCR, images recognition, it can generate and read barcodes from images, scans and PDF.

Learn How to Convert PDF to XML from the Uploaded File to Have PDF to XML API in PHP

The sample source code below will show you how to cope with a difficult task, for example, PDF to XML API in PHP. PDF.co Web API was made to help with PDF to XML API in PHP. PDF.co Web API is the Web API with a set of tools for document manipulation, data conversion, data extraction, splitting, and merging of documents.

PHP code samples for PHP developers help to speed up the application’s code writing when using PDF.co Web API. For the implementation of this functionality, please copy and paste the code below into your app using the code editor. Then compile and run your app. This basic programming language sample code for PHP will do the whole work for you in implementing PDF to XML API in your app.

Also, there are other code samples to help you with your PHP application. The source code for this sample is available here. You can also find source code for PDF to XML conversion into other languages at this location.

On-demand (REST Web API) version:
 Web API (on-demand version)

On-premise offline SDK for Windows:
 60 Day Free Trial (on-premise)

pdf-to-xml.php

      
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>PDF To XML Extraction Results</title> </head> <body> <?php // Note: If you have input files large than 200kb we highly recommend to check "async" mode example. // Get submitted form data $apiKey = $_POST["apiKey"]; // The authentication key (API Key). Get your own by registering at https://app.pdf.co/documentation/api $pages = $_POST["pages"]; // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE. // * If you already have the direct PDF file link, go to the step 3. // Create URL $url = "https://api.pdf.co/v1/file/upload/get-presigned-url" . "?name=" . $_FILES["file"]["name"] . "&contenttype=application/octet-stream"; // Create request $curl = curl_init(); curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey)); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); // Execute request $result = curl_exec($curl); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { $json = json_decode($result, true); // Get URL to use for the file upload $uploadFileUrl = $json["presignedUrl"]; // Get URL of uploaded file to use with later API calls $uploadedFileUrl = $json["url"]; // 2. UPLOAD THE FILE TO CLOUD. $localFile = $_FILES["file"]["tmp_name"]; $fileHandle = fopen($localFile, "r"); curl_setopt($curl, CURLOPT_URL, $uploadFileUrl); curl_setopt($curl, CURLOPT_HTTPHEADER, array("content-type: application/octet-stream")); curl_setopt($curl, CURLOPT_PUT, true); curl_setopt($curl, CURLOPT_INFILE, $fileHandle); curl_setopt($curl, CURLOPT_INFILESIZE, filesize($localFile)); // Execute request curl_exec($curl); fclose($fileHandle); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { // 3. CONVERT UPLOADED PDF FILE TO XML ExtractXML($apiKey, $uploadedFileUrl, $pages); } else { // Display request error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } } else { // Display CURL error echo "Error: " . curl_error($curl); } } else { // Display service reported error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } curl_close($curl); } else { // Display CURL error echo "Error: " . curl_error($curl); } function ExtractXML($apiKey, $uploadedFileUrl, $pages) { // Create URL $url = "https://api.pdf.co/v1/pdf/convert/to/xml"; // Prepare requests params $parameters = array(); $parameters["url"] = $uploadedFileUrl; $parameters["pages"] = $pages; // Create Json payload $data = json_encode($parameters); // Create request $curl = curl_init(); curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json")); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_POST, true); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_POSTFIELDS, $data); // Execute request $result = curl_exec($curl); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { $json = json_decode($result, true); if ($json["error"] == false) { $resultFileUrl = $json["url"]; // Display link to the file with conversion results echo "<div><h2>Conversion Result:</h2><a href='" . $resultFileUrl . "' target='_blank'>" . $resultFileUrl . "</a></div>"; } else { // Display service reported error echo "<p>Error: " . $json["message"] . "</p>"; } } else { // Display request error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } } else { // Display CURL error echo "Error: " . curl_error($curl); } // Cleanup curl_close($curl); } ?> </body> </html>

Output

PDFco_PDFtoXML_PHP

Now that we’ve seen the source code with its output, let’s move its analysis.

This sample mainly consists of three files – the input HTML file, the PHP file, and the sample PDF file. The input HTML file displays control such as the placeholder for the PDF.co API key, the input file control, the placeholder for pages to be selected, etc. The PHP file handles all logic for PDF to XML conversion. Lastly, the PDF file containing invoice data is used as the input for this demonstration.

Moving to the main logic in the PHP file, We can logically distribute the code into two parts. First, we’re uploading the input PDF file to the PDF.co cloud and retrieving the public URL. Secondly, we’re using this public URL to process PDF to XML conversion.

To upload the input PDF file to PDF.co cloud, we’re using the PDF.co endpoint /v1/file/get-presigned-url. This endpoint takes a file name as input and it responds back mainly with the pre-signed URL ($json[“presignedUrl”]) and the public URL ($json[“url”]). Then we’re using this pre-signed URL to upload the input files using a PUT request. Once the file upload is completed, this public URL will point to the uploaded file and can be used anywhere. Please note, this public URL is temporary and the file is stored for only a few minutes on the PDF.co cloud.

PDF.co endpoint /v1/pdf/convert/to/xml is used to convert PDF to XML format. Input parameters consist of the URL of the PDF file and the number of pages that need to be processed. The request input is in JSON format and for that, we’re first preparing the PHP array for each input parameter and finally, we’re converting this array to JSON using json_encode method. PDF.co API key is being passed in the request header with x-api-key as the header key. This header key/value is being used by PDF.co to authenticate requests.

The response contains URL ($json[“url”]) which points to the output XML format data. By closely observing the output, we can confirm that all invoice data from the input PDF is present in a nicely formatted XML. This XML also has useful information such as the position of data in the PDF file, and font information such as font type, color, size, etc.

In this sample, we’re doing the basic conversion from PDF to XML and we’re preparing the request with minimal data. PDF.co also provides the advanced request parameters and we can customize the output based on our requirements. Please refer to PDF.co API documentation for more information.

To get more out of this sample, please try this program on your machine. Thank you for reading!

VIDEO

ON-PREMISE OFFLINE SDK

Get 60 Day Free Trial

See also:

ON-DEMAND REST WEB API

Get Your API Key

See also: