How to convert PDF to HTML from uploaded file for PDF to HTML API in PHP using PDF.co Web API

Learn how to convert PDF to HTML from uploaded file to have PDF to HTML API in PHP

Today we will explain the steps and algorithm of how to convert PDF to HTML from uploaded file and how to make it work in your application. PDF.co Web API was made to help with PDF to HTML API in PHP. PDF.co Web API is the flexible Web API that includes full set of functions from e-signature requests to data extraction, OCR, images recognition, pdf splitting and pdf splitting. Can also generate barcodes and read barcodes from images, scans and pdf.

Fast application programming interfaces of PDF.co Web API for PHP plus the instruction and the code below will help to learn how to convert PDF to HTML from uploaded file. This PHP sample code should be copied and pasted into your project. Full source code is also available at our GitHub repository at here. After doing this just compile your project and click Run. Enjoy writing a code with ready-to-use sample PHP codes to implement PDF to HTML API using PDF.co Web API.

ByteScout free trial version is available for FREE download from our website. Programming tutorials along with source code samples are included.

On-demand (REST Web API) version:
 Web API (on-demand version)

On-premise offline SDK for Windows:
 60 Day Free Trial (on-premise)

Let’s review source code and it’s output first, then we’ll analyze important code snippets.

Source Code

pdf-to-html.php

      
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>PDF To HTML Extraction Results</title> </head> <body> <?php // Get submitted form data $apiKey = $_POST["apiKey"]; // The authentication key (API Key). Get your own by registering at https://app.pdf.co/documentation/api $extractionType = $_POST["extractionType"]; $pages = $_POST["pages"]; $plainHtml = $_POST["plainHtml"]; $columnLayout = $_POST["columnLayout"]; // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE. // * If you already have the direct PDF file link, go to the step 3. // Create URL $url = "https://api.pdf.co/v1/file/upload/get-presigned-url" . "?name=" . $_FILES["file"]["name"] . "&contenttype=application/octet-stream"; // Create request $curl = curl_init(); curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey)); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); // Execute request $result = curl_exec($curl); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { $json = json_decode($result, true); // Get URL to use for the file upload $uploadFileUrl = $json["presignedUrl"]; // Get URL of uploaded file to use with later API calls $uploadedFileUrl = $json["url"]; // 2. UPLOAD THE FILE TO CLOUD. $localFile = $_FILES["file"]["tmp_name"]; $fileHandle = fopen($localFile, "r"); curl_setopt($curl, CURLOPT_URL, $uploadFileUrl); curl_setopt($curl, CURLOPT_HTTPHEADER, array("content-type: application/octet-stream")); curl_setopt($curl, CURLOPT_PUT, true); curl_setopt($curl, CURLOPT_INFILE, $fileHandle); curl_setopt($curl, CURLOPT_INFILESIZE, filesize($localFile)); // Execute request curl_exec($curl); fclose($fileHandle); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { // 3. CONVERT UPLOADED PDF FILE TO HTML PdfToHtml($apiKey, $uploadedFileUrl, $pages, $plainHtml, $columnLayout); } else { // Display request error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } } else { // Display CURL error echo "Error: " . curl_error($curl); } } else { // Display service reported error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } curl_close($curl); } else { // Display CURL error echo "Error: " . curl_error($curl); } function PdfToHtml($apiKey, $uploadedFileUrl, $pages, $plainHtml, $columnLayout) { // Create URL $url = "https://api.pdf.co/v1/pdf/convert/to/html"; // Prepare requests params $parameters = array(); $parameters["url"] = $uploadedFileUrl; $parameters["pages"] = $pages; $parameters["simple"] = $plainHtml; $parameters["columns"] = $columnLayout; // Create Json payload $data = json_encode($parameters); // Create request $curl = curl_init(); curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json")); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_POST, true); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_POSTFIELDS, $data); // Execute request $result = curl_exec($curl); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { $json = json_decode($result, true); if ($json["error"] == false) { $resultFileUrl = $json["url"]; // Display link to the file with conversion results echo "<div><h2>Conversion Result:</h2><a href='" . $resultFileUrl . "' target='_blank'>" . $resultFileUrl . "</a></div>"; } else { // Display service reported error echo "<p>Error: " . $json["message"] . "</p>"; } } else { // Display request error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } } else { // Display CURL error echo "Error: " . curl_error($curl); } // Cleanup curl_close($curl); } ?> </body> </html>

Output

Now that we’ve reviewed source code and it’s output, let’s analyze code briefly.

We can logically divide code into two parts. First we’re uploading input PDF file to PDF.co cloud and getting public URL, and after that we’re using that URL for converting it to HTML.

PDF.co API endpoint /v1/file/upload/get-presigned-url is used to upload file to PDF.co. Input request contains file name of to be uploaded file. In the output it’s primarily returning two URLs, one is pre-signed URL ($json[“presignedUrl”]) and other is public URL ($json[“url”]). Pre-signed URL is being used to upload actual file to PDF.co using PUT request. Public URL points to uploaded file. Please Note: this public URL is temporary and only valid for few minutes.

PDF.co makes converting PDF to HTML effortless. By using PDF.co endpoint /v1/pdf/convert/to/html PDF conversion is done. Input parameters consists of URL of input PDF file, no of pages needs to be converted to HTML, etc. This endpoint also supports advanced parameters for whether we want to retain CSS formatting or not, is column layout needs to be preserved like in newspaper or not etc. To know about all supported input parameters, please visit API documentations. Response of this API endpoint primarily contains URL of output HTML.

This is how easy to convert from PDF to HTML by using PDF.co endpoints. To get more out of this article please try this sample in your machine. Thank you for reading!

VIDEO

ON-PREMISE OFFLINE SDK

Get 60 Day Free Trial

See also:

ON-DEMAND REST WEB API

Get Your API Key

See also:

Related Samples: