How to convert PDF to XML from uploaded file for PDF to XML API in PHP with PDF.co Web API

Learn how to convert PDF to XML from the uploaded file to have PDF to XML API in PHP

The sample source codes below will show you how to cope with a difficult task, for example, PDF to XML API in PHP. PDF.co Web API was made to help with PDF to XML API in PHP. PDF.co Web API is the Web API with a set of tools for document manipulation, data conversion, data extraction, splitting, and merging of documents. Includes image recognition, built-in OCR, barcode generation, and barcode decoders to decode bar codes from scans, pictures and pdf.

PHP code samples for PHP developers help to speed up the application’s code writing when using PDF.co Web API. For the implementation of this functionality, please copy and paste the code below into your app using the code editor. Then compile and run your app. This basic programming language sample code for PHP will do the whole work for you in implementing PDF to XML API in your app.

PDF.co Web API – free trial version is on available our website. Also, there are other code samples to help you with your PHP application included into the trial version. The source code for this sample is available on our GitHub repository at this location. You can also find source code for PDF to XML conversion into other languages at this GitHub location.

On-demand (REST Web API) version:
 Web API (on-demand version)

On-premise offline SDK for Windows:
 60 Day Free Trial (on-premise)

Let’s review source code and it’s output and then we’ll analyze it.

Source Code:

pdf-to-xml.php

      
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>PDF To XML Extraction Results</title> </head> <body> <?php // Get submitted form data $apiKey = $_POST["apiKey"]; // The authentication key (API Key). Get your own by registering at https://app.pdf.co/documentation/api $pages = $_POST["pages"]; // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE. // * If you already have the direct PDF file link, go to the step 3. // Create URL $url = "https://api.pdf.co/v1/file/upload/get-presigned-url" . "?name=" . $_FILES["file"]["name"] . "&contenttype=application/octet-stream"; // Create request $curl = curl_init(); curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey)); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); // Execute request $result = curl_exec($curl); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { $json = json_decode($result, true); // Get URL to use for the file upload $uploadFileUrl = $json["presignedUrl"]; // Get URL of uploaded file to use with later API calls $uploadedFileUrl = $json["url"]; // 2. UPLOAD THE FILE TO CLOUD. $localFile = $_FILES["file"]["tmp_name"]; $fileHandle = fopen($localFile, "r"); curl_setopt($curl, CURLOPT_URL, $uploadFileUrl); curl_setopt($curl, CURLOPT_HTTPHEADER, array("content-type: application/octet-stream")); curl_setopt($curl, CURLOPT_PUT, true); curl_setopt($curl, CURLOPT_INFILE, $fileHandle); curl_setopt($curl, CURLOPT_INFILESIZE, filesize($localFile)); // Execute request curl_exec($curl); fclose($fileHandle); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { // 3. CONVERT UPLOADED PDF FILE TO XML ExtractXML($apiKey, $uploadedFileUrl, $pages); } else { // Display request error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } } else { // Display CURL error echo "Error: " . curl_error($curl); } } else { // Display service reported error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } curl_close($curl); } else { // Display CURL error echo "Error: " . curl_error($curl); } function ExtractXML($apiKey, $uploadedFileUrl, $pages) { // Create URL $url = "https://api.pdf.co/v1/pdf/convert/to/xml"; // Prepare requests params $parameters = array(); $parameters["url"] = $uploadedFileUrl; $parameters["pages"] = $pages; // Create Json payload $data = json_encode($parameters); // Create request $curl = curl_init(); curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json")); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_POST, true); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_POSTFIELDS, $data); // Execute request $result = curl_exec($curl); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { $json = json_decode($result, true); if ($json["error"] == false) { $resultFileUrl = $json["url"]; // Display link to the file with conversion results echo "<div><h2>Conversion Result:</h2><a href='" . $resultFileUrl . "' target='_blank'>" . $resultFileUrl . "</a></div>"; } else { // Display service reported error echo "<p>Error: " . $json["message"] . "</p>"; } } else { // Display request error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } } else { // Display CURL error echo "Error: " . curl_error($curl); } // Cleanup curl_close($curl); } ?> </body> </html>

Output:

Now that we’ve seen source code with its output, let’s move its analysis.

This sample mainly consists of three files – the input HTML file, the PHP file, and the sample PDF file. The input HTML file displays control such as placeholder for PDF.co API key, the input file control, placeholder for pages to be selected, etc. The PHP file handles all logic for PDF to XML conversation. Lastly, the PDF file containing invoice data is used as input for this demonstration.

Moving to the main logic in the PHP file, We can logically distribute code into two parts. First, we’re uploading an input PDF file to the PDF.co cloud and retrieving the public URL. Second, we’re using this public URL to process PDF to XML conversation.

To upload input PDF file to PDF.co cloud, we’re using PDF.co endpoint /v1/file/get-presigned-url. This endpoint takes file name as input and it responds back mainly with the pre-signed URL ($json[“presignedUrl”]) and the public URL ($json[“url”]). Then we’re using this pre-signed URL to upload input files using a PUT request. Once the file upload is completed, this public URL will point to the uploaded file and can be used anywhere. Please note, this public URL is temporary and the file is stored for only a few minutes on PDF.co cloud.

PDF.co endpoint /v1/pdf/convert/to/xml is used to convert PDF to XML format. Input parameters consist of the URL of the PDF file and the number of pages that need to be processed. The request input is in JSON format and for that, we’re first preparing the PHP array for each input parameter and finally, we’re converting this array to JSON using json_encode method. PDF.co API key is being passed in the request header with x-api-key as header key. This header key/value is being used by PDF.co to authenticate requests.

The response contains URL ($json[“url”]) which points to the output XML format data. By closely observing the output, we can confirm that all invoice data from the input PDF is present in a nicely formatted XML. This XML also has useful information such as the position of data in the PDF file, font information such as font type, color, size, etc.

In this sample we’re doing the basic conversion from PDF to XML and we’re preparing the request with minimal data. PDF.co also provides the advanced request parameters and we can customize the output based on our requirement. Please refer to PDF.co API documentations for more information.

To get more out of this sample, please try this program on your machine. Thank you for reading!

VIDEO

ON-PREMISE OFFLINE SDK

Get 60 Day Free Trial

See also:

ON-DEMAND REST WEB API

Get Your API Key

See also:

Related Samples: