How to Convert PDF to HTML in PHP from Uploaded File using PDF.co Web API
PDF.co Web API is the Web API with a set of tools for documents manipulation, data conversion, data extraction, splitting and merging of documents. Includes image recognition, built-in OCR, barcode generation and barcode decoders to decode bar codes from scans, pictures and PDF.
Learn How to Convert PDF to HTML in PHP from the Uploaded File using PDF.co Web API Source Code
Today we will explain the steps and algorithm of how to convert PDF to HTML in PHP from an uploaded file and how to make it work in your application. PDF.co Web API was made to help with PDF to HTML API in PHP. PDF.co Web API is the flexible Web API that includes a full set of functions from e-signature requests to data extraction, OCR, images recognition, PDF splitting and PDF merging. It can also generate barcodes and read barcodes from images, scans, and PDF.
Fast application programming interfaces of PDF.co Web API for PHP plus the instruction and the code below will help to learn how to convert PDF to HTML from uploaded file. This PHP sample code should be copied and pasted into your project. Full source code is also available at our GitHub repository at here. After doing this just compile your project and click Run. Enjoy writing a code with ready-to-use sample PHP codes to implement PDF to HTML API using PDF.co Web API.
On-demand (REST Web API) version:
Web API (on-demand version)
On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)
pdf-to-html.php
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>PDF To HTML Extraction Results</title> </head> <body> <?php // Note: If you have input files large than 200kb we highly recommend to check "async" mode example. // Get submitted form data $apiKey = $_POST["apiKey"]; // The authentication key (API Key). Get your own by registering at https://app.pdf.co/documentation/api $extractionType = $_POST["extractionType"]; $pages = $_POST["pages"]; $plainHtml = $_POST["plainHtml"]; $columnLayout = $_POST["columnLayout"]; // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE. // * If you already have the direct PDF file link, go to the step 3. // Create URL $url = "https://api.pdf.co/v1/file/upload/get-presigned-url" . "?name=" . $_FILES["file"]["name"] . "&contenttype=application/octet-stream"; // Create request $curl = curl_init(); curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey)); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); // Execute request $result = curl_exec($curl); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { $json = json_decode($result, true); // Get URL to use for the file upload $uploadFileUrl = $json["presignedUrl"]; // Get URL of uploaded file to use with later API calls $uploadedFileUrl = $json["url"]; // 2. UPLOAD THE FILE TO CLOUD. $localFile = $_FILES["file"]["tmp_name"]; $fileHandle = fopen($localFile, "r"); curl_setopt($curl, CURLOPT_URL, $uploadFileUrl); curl_setopt($curl, CURLOPT_HTTPHEADER, array("content-type: application/octet-stream")); curl_setopt($curl, CURLOPT_PUT, true); curl_setopt($curl, CURLOPT_INFILE, $fileHandle); curl_setopt($curl, CURLOPT_INFILESIZE, filesize($localFile)); // Execute request curl_exec($curl); fclose($fileHandle); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { // 3. CONVERT UPLOADED PDF FILE TO HTML PdfToHtml($apiKey, $uploadedFileUrl, $pages, $plainHtml, $columnLayout); } else { // Display request error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } } else { // Display CURL error echo "Error: " . curl_error($curl); } } else { // Display service reported error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } curl_close($curl); } else { // Display CURL error echo "Error: " . curl_error($curl); } function PdfToHtml($apiKey, $uploadedFileUrl, $pages, $plainHtml, $columnLayout) { // Create URL $url = "https://api.pdf.co/v1/pdf/convert/to/html"; // Prepare requests params $parameters = array(); $parameters["url"] = $uploadedFileUrl; $parameters["pages"] = $pages; $parameters["simple"] = $plainHtml; $parameters["columns"] = $columnLayout; // Create Json payload $data = json_encode($parameters); // Create request $curl = curl_init(); curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json")); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_POST, true); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_POSTFIELDS, $data); // Execute request $result = curl_exec($curl); if (curl_errno($curl) == 0) { $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE); if ($status_code == 200) { $json = json_decode($result, true); if ($json["error"] == false) { $resultFileUrl = $json["url"]; // Display link to the file with conversion results echo "<div><h2>Conversion Result:</h2><a href='" . $resultFileUrl . "' target='_blank'>" . $resultFileUrl . "</a></div>"; } else { // Display service reported error echo "<p>Error: " . $json["message"] . "</p>"; } } else { // Display request error echo "<p>Status code: " . $status_code . "</p>"; echo "<p>" . $result . "</p>"; } } else { // Display CURL error echo "Error: " . curl_error($curl); } // Cleanup curl_close($curl); } ?> </body> </html>
Output
Now that we’ve reviewed the source code to convert PDF to HTML in PHP and its output, let’s analyze the code briefly.
We can logically divide code into two parts. First, we’re uploading an input PDF file to the PDF.co cloud and getting a public URL, and after that, we’re using that URL for converting it to HTML.
PDF.co API endpoint /v1/file/upload/get-presigned-url is used to upload file to PDF.co. The input request contains the file name of to be uploaded file. In the output it’s primarily returning two URLs, one is a pre-signed URL ($json[“presignedUrl”]) and the other is a public URL ($json[“url”]). Pre-signed URL is being used to upload the actual file to PDF.co using a PUT request. Public URL points to uploaded file. Please Note: this public URL is temporary and only valid for a few minutes.
PDF.co makes converting PDF to HTML effortless. By using PDF.co endpoint /v1/pdf/convert/to/html PDF conversion is done. Input parameters consist of the URL of the input PDF file, the number of pages needs to be converted to HTML, etc. This endpoint also supports the advanced parameters for whether we want to retain the CSS formatting or not, whether the column layout needs to be preserved like in a newspaper or not etc. To know about all supported input parameters, please visit the API documentation. the response of this API endpoint primarily contains the URL of the output HTML.
This is how easy to convert from PDF to HTML by using PDF.co endpoints. To get more out of this article please try this sample on your machine. Thank you for reading!
VIDEO TUTORIAL
ON-PREMISE OFFLINE SDK
See also:
ON-DEMAND REST WEB API
Get Your API Key
See also:
PDF-co-Web-API-PHP-Convert-PDF-To-HTML-From-Uploaded-File.pdf