How to convert PDF to JSON from URL for PDF to JSON API in Java using PDF.co Web API

Tutorial: how to convert PDF to JSON from URL for PDF to JSON API in Java

Writing of the code to convert PDF to JSON from URL in Java can be done by developers of any level using PDF.co Web API. PDF.co Web API helps with PDF to JSON API in Java. PDF.co Web API is the Rest API that provides a set of data extraction functions, tools for document manipulation, splitting, and merging of pdf files. Includes built-in OCR, image recognition, can generate and read barcodes from images, scans, and pdf.

You will save a lot of time on writing and testing code as you may just take the code below and use it in your application. Open your Java project and simply copy & paste the code and then run your app! This basic programming language sample code for Java will do the whole work for you in implementing PDF to JSON API in your app. You can also get this Java code snippet from our GitHub repository.

ByteScout free trial version is available for FREE download from our website. Programming tutorials along with source code samples are included.

On-demand (REST Web API) version:
 Web API (on-demand version)

On-premise offline SDK for Windows:
 60 Day Free Trial (on-premise)

Let’s review the Source code and its output first, then we’ll analyze it.

Source Code:

ByteScoutWebApiExample.iml

      
<?xml version="1.0" encoding="UTF-8"?> <module type="JAVA_MODULE" version="4"> <component name="NewModuleRootManager" inherit-compiler-output="true"> <exclude-output /> <content url="file://$MODULE_DIR{code}quot;> <sourceFolder url="file://$MODULE_DIR$/src" isTestSource="false" /> </content> <orderEntry type="inheritedJdk" /> <orderEntry type="sourceFolder" forTests="false" /> <orderEntry type="library" name="com.google.code.gson:gson:2.8.1" level="project" /> <orderEntry type="library" name="com.squareup.okhttp3:okhttp:3.8.1" level="project" /> </component> </module>

main.java

package com.company; import com.google.gson.JsonObject; import com.google.gson.JsonParser; import okhttp3.*; import java.io.*; import java.net.*; import java.nio.file.Path; import java.nio.file.Paths; public class Main { // The authentication key (API Key). // Get your own by registering at https://app.pdf.co/documentation/api final static String API_KEY = "***********************************"; // Source PDF file final static Path SourceFile = Paths.get(".\\sample.pdf"); // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. final static String Pages = ""; // PDF document password. Leave empty for unprotected documents. final static String Password = ""; // Destination JSON file name final static Path DestinationFile = Paths.get(".\\result.json"); public static void main(String[] args) throws IOException { // Create HTTP client instance OkHttpClient webClient = new OkHttpClient(); // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE. // * If you already have a direct file URL, skip to step 3. // Prepare URL for `Get Presigned URL` API call String query = String.format( "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name=%s", SourceFile.getFileName()); // Prepare request Request request = new Request.Builder() .url(query) .addHeader("x-api-key", API_KEY) // (!) Set API Key .build(); // Execute request Response response = webClient.newCall(request).execute(); if (response.code() == 200) { // Parse JSON response JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject(); boolean error = json.get("error").getAsBoolean(); if (!error) { // Get URL to use for the file upload String uploadUrl = json.get("presignedUrl").getAsString(); // Get URL of uploaded file to use with later API calls String uploadedFileUrl = json.get("url").getAsString(); // 2. UPLOAD THE FILE TO CLOUD. if (uploadFile(webClient, API_KEY, uploadUrl, SourceFile)) { // 3. CONVERT UPLOADED PDF FILE TO JSON PdfToJson(webClient, API_KEY, DestinationFile, Password, Pages, uploadedFileUrl); } } else { // Display service reported error System.out.println(json.get("message").getAsString()); } } else { // Display request error System.out.println(response.code() + " " + response.message()); } } public static void PdfToJson(OkHttpClient webClient, String apiKey, Path destination file, String password, String pages, String uploadedFileUrl) throws IOException { // Prepare URL for `PDF To JSON` API call String query = "https://api.pdf.co/v1/pdf/convert/to/json"; // Make correctly escaped (encoded) URL URL url = null; try { url = new URI(null, query, null).toURL(); } catch (URISyntaxException e) { e.printStackTrace(); } // Create JSON payload String jsonPayload = String.format("{\"name\": \"%s\", \"password\": \"%s\", \"pages\": \"%s\", \"url\": \"%s\"}", destinationFile.getFileName(), password, pages, uploadedFileUrl); // Prepare request body RequestBody body = RequestBody.create(MediaType.parse("application/json"), jsonPayload); // Prepare request Request request = new Request.Builder() .url(url) .addHeader("x-api-key", API_KEY) // (!) Set API Key .addHeader("Content-Type", "application/json") .post(body) .build(); // Execute request Response response = webClient.newCall(request).execute(); if (response.code() == 200) { // Parse JSON response JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject(); boolean error = json.get("error").getAsBoolean(); if (!error) { // Get URL of generated JSON file String resultFileUrl = json.get("url").getAsString(); // Download JSON file downloadFile(webClient, resultFileUrl, destinationFile.toFile()); System.out.printf("Generated JSON file saved as \"%s\" file.", destinationFile.toString()); } else { // Display service reported error System.out.println(json.get("message").getAsString()); } } else { // Display request error System.out.println(response.code() + " " + response.message()); } } public static boolean uploadFile(OkHttpClient webClient, String apiKey, String url, Path sourceFile) throws IOException { // Prepare request body RequestBody body = RequestBody.create(MediaType.parse("application/octet-stream"), sourceFile.toFile()); // Prepare request Request request = new Request.Builder() .url(url) .addHeader("x-api-key", apiKey) // (!) Set API Key .addHeader("content-type", "application/octet-stream") .put(body) .build(); // Execute request Response response = webClient.newCall(request).execute(); return (response.code() == 200); } public static void downloadFile(OkHttpClient webClient, String url, File destinationFile) throws IOException { // Prepare request Request request = new Request.Builder() .url(url) .build(); // Execute request Response response = webClient.newCall(request).execute(); byte[] fileBytes = response.body().bytes(); // Save downloaded bytes to file OutputStream output = new FileOutputStream(destinationFile); output.write(fileBytes); output.flush(); output.close(); response.close(); } }

Output:

Though this code snippet is self-explanatory, let’s review the main parts of it. If we want we can divide code into two parts. First, we’re uploading input PDF and getting pubic URL; and Second, we’re using that URL to convert to JSON and save to output file. At the beginning of code, we’re gathering all necessary information such as PDF.co API key, Source File Path, Pages to be converted, Destination JSON File Path, etc.

In order to upload a PDF file to PDF.co cloud, we need a pre-signed URL. As name suggest, PDF.co API endpoint /v1/file/upload/get-presigned-url is used to get pre-signed URL. This API endpoint expects the input file name as an input argument, and it returns the pre-signed URL (json.get(“presignedUrl”)) along with the public URL (json.get(“url”)). Then we’re uploading an input PDF file with a PUT request to this pre-signed URL. Upon a successful file upload, the public URL points to the uploaded source file. This public URL is temporary and only available for a few time due to security reasons.

PDF.co API endpoint /v1/pdf/convert/to/json is used to convert PDF to JSON format. JSON payload for all input parameters is prepared and passed in the request body. The response contains a URL that points to converted JSON data. This API endpoint has inbuilt support to work with scanned PDF files. We can customize the JSON creation by providing additional parameters. Following are example of additional input parameters.

rect If we want to extract JSON from only certain portion of input PDF, then we can use this parameter. This parameter expects extraction region co-ordinates as value.
lang If we want to specify other OCR language for scanned PDF then we can use this parameter. By Default english is used for scanned PDF.
async This parameter is very helpful when we have big input file and processing it might take time. If value of this parameter is set to True, then API will execute in async mode and return JOB as output. Once that JOB is completed, we can access data.

Please refer to PDF.co documentation for more information regarding various parameters available.

Execute and play with this code in your machine to get more idea. Thank you for reading!

VIDEO

ON-PREMISE OFFLINE SDK

Get 60 Day Free Trial

See also:

ON-DEMAND REST WEB API

Get Your API Key

See also:

Related Samples: