How to Convert PDF to JSON Meta in C#

In this article, you will see how to convert a PDF document to JSON Meta using the PDF.CO Web API in the C# programming language.

PDF.CO Web API provides several functionalities for Converting a PDF document to JSON Meta. Using this API, you can convert a PDF document containing useful information such as company names, addresses, etc, to JSON Meta.

In this article, you will be using the AI-powered API calls from PDF.CO for converting a PDF document to JSON Meta in two ways. Learn how to:

PDF to JSON Meta from URL

In this section, you will see how to convert a PDF document to JSON meta using a PDF file from a URL.

Steps

The scripts in this section are executed inside a C# Console application.

The following code imports the required libraries.

using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

The above libraries are installed by default with Microsoft .NET framework except for the Newtonsoft library. You can install the Newtonsoft library in the Microsoft Visual Studio IDE.

Go to “Tools-> NuGet Package Manager”. Search for “Newtonsoft” library. You will see the following packages. Select the first package and click the “Install” button.

Newtonsoft.Json Library

The sample file used in this section to convert PDF to JSON Meta can be found at this link:

https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-json/sample.pdf

To start, you need to create variables that store the PDF.CO API key, the URL to the source PDF document along with its password, and the path to the destination JSON file. The following script defines these variables.

    const String API_KEY = "**************************";
    const string SourceFile = @"https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-json/sample.pdf";
    const string DestinationFile = @".\newDocument.json";
    const string Password = "";

Note: You can get your API key by registering at https://app.pdf.co

The next step is to create an object of the WebClient class. This object is used to make calls to the PDF.CO Web API. The following script creates the WebClient class object.

           WebClient webClient = new WebClient();
           webClient.Headers.Add("x-api-key", API_KEY);

The path for the PDF.CO API call that converts the PDF document to JSON meta is stored as a string variable, as shown in the script below.

string url = "https://api.pdf.co/v1/pdf/convert/to/json-meta";

Next, you need to create a parameter dictionary containing the information that will be passed to the API call that converts PDF documents to JSON Meta.

The parameter values that you need to pass are the name of the destination JSON file, the URL for the source PDF file along with its password, and the number of pages that you want to convert to JSON Meta.

           Dictionary<string, object> parameters = new Dictionary<string, object>();
           parameters.Add("name", Path.GetFileName(DestinationFile));
           parameters.Add("password", Password);
           parameters.Add("pages", "");
           parameters.Add("url", SourceFile);

There are some other optional parameters as well which you can check in the official API documentation.

The parameter dictionary is then serialized using the script below:

           string jsonPayload = JsonConvert.SerializeObject(parameters);

We are now ready to make an API call to the PDF.CO Web API. The rest of the script is executed inside a Try/Catch block.

In the script below, the “UploadString()” method from the WebClient class object is used to make an API call to the PDF.CO Web API.

The response from the API call is parsed as a JSON object. If the JSON object returns an error, we print the error message. Else, if there is no error, the URL of the converted JSON Meta file is extracted from the JSON object. The URL is then used by the “DownloadFile()” method of the WebClient class object to download the JSON Meta file.

 try
           {

                string response = webClient.UploadString(url, jsonPayload);

                JObject json = JObject.Parse(response);

                if (json["error"].ToObject<bool>() == false)
                {

                string resultFileUrl = json["url"].ToString();

                Console.WriteLine(resultFileUrl);

                webClient.DownloadFile(resultFileUrl, DestinationFile);

                 Console.WriteLine("Generated PDF file saved as \"{0}\" file.", DestinationFile);
                }
                else
                {
                Console.WriteLine(json["message"].ToString());
                }
           }
           catch (WebException e)
           {
                Console.WriteLine(e.ToString());
           }
 

Finally, the WebClient class object is destroyed as shown in the following script:

           webClient.Dispose();

           Console.WriteLine();
           Console.WriteLine("Press any key...");
           Console.ReadKey();

Complete Code for PDF to JSON Meta from URL

The complete code for the Console application that converts a PDF document to JSON Meta is as follows:

 using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

namespace ByteScoutWebApiExample
{
     class Program
     {

    const String API_KEY = "**************************";
    const string SourceFile = @"https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-json/sample.pdf";
    const string DestinationFile = @".\newDocument.json";
    const string Password = "";


    static void Main(string[] args)
    {

           WebClient webClient = new WebClient();
           webClient.Headers.Add("x-api-key", API_KEY);


           string url = "https://api.pdf.co/v1/pdf/convert/to/json-meta";

      
           Dictionary<string, object> parameters = new Dictionary<string, object>();
           parameters.Add("name", Path.GetFileName(DestinationFile));
           parameters.Add("password", Password);
           parameters.Add("pages", "");
           parameters.Add("url", SourceFile);

           // Convert dictionary of params to JSON
           string jsonPayload = JsonConvert.SerializeObject(parameters);

           try
           {

                string response = webClient.UploadString(url, jsonPayload);

                JObject json = JObject.Parse(response);

                if (json["error"].ToObject<bool>() == false)
                {

                string resultFileUrl = json["url"].ToString();

                Console.WriteLine(resultFileUrl);

                webClient.DownloadFile(resultFileUrl, DestinationFile);

                Console.WriteLine("Generated PDF file saved as \"{0}\" file.", DestinationFile);
                }
                else
                {
                Console.WriteLine(json["message"].ToString());
                }
           }
           catch (WebException e)
           {
                Console.WriteLine(e.ToString());
           }

           webClient.Dispose();

           Console.WriteLine();
           Console.WriteLine("Press any key...");
           Console.ReadKey();
    }
     }
}

Output

Our input PDF file looks like this.

Input

A snapshot of the converted PDF file containing the JSON Meta is shown below. You can see that the company name is extracted successfully.

Output

PDF to JSON Meta from Uploaded File

In this section, you will see how to extract JSON Meta from a PDF file that is uploaded to PDF.CO cloud from your local drive.

Steps

As you did previously, the first step is to define variables that store the PDF.co API key, the path to the source and destination files, and the password for the source PDF file. Here is a script that does that.

    const String API_KEY = "usmanmalik57@gmail.com_8df5671cedb0fb37ec9610eada409f110c38";
    const string SourceFile = @".\sample_doc.pdf";
    const string DestinationFile = @".\newDocument.json";
    const string Password = "";

The script that creates the WebClient class object that makes API calls to the PDF.CO Web API is as follows:

WebClient webClient = new WebClient();
webClient.Headers.Add("x-api-key", API_KEY);

The next step is different from what you saw in the previous section. The script below makes a call to a PDF.CO API that uploads a local file to the PDF.CO cloud storage. The response contains a URL path where the file will be stored.

           string query = Uri.EscapeUriString(string.Format(
                "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name={0}",
                Path.GetFileName(SourceFile)));

           string response = webClient.DownloadString(query);

           JObject json = JObject.Parse(response);

           string uploadUrl = json["presignedUrl"].ToString();
           string uploadedFileUrl = json["url"].ToString();

To actually upload the file, you can use the “UploadFile()” method from the WebClient class object as shown in the script below:

           webClient.Headers.Add("content-type", "application/octet-stream");
           webClient.UploadFile(uploadUrl, "PUT", SourceFile);
           webClient.Headers.Remove("content-type");

The rest of the process is similar to what you saw in the previous section. The script below defines the path to the API call that converts a PDF document to JSON Meta.

           string url = "https://api.pdf.co/v1/pdf/convert/to/json-meta";

The script below creates a parameter dictionary for the API call. The parameter values consist of the path to the source PDF file, which in this case is the URL to the PDF file that you uploaded in the previous script. The rest of the parameter values include the password for the uploaded PDF file, the name of the destination JSON file, and the pages of the PDF document that you want to convert to JSON Meta. The parameter dictionary is then serialized before being passed to the API call.

           Dictionary<string, object> parameters = new Dictionary<string, object>();
           parameters.Add("name", Path.GetFileName(DestinationFile));
           parameters.Add("password", Password);
           parameters.Add("pages", "");
           parameters.Add("url", uploadedFileUrl);

           // Convert dictionary of params to JSON
           string jsonPayload = JsonConvert.SerializeObject(parameters);

The rest of the code executes inside a try/catch block.

The WebClient class object’s “UploadString()” method is used to make a call to the PDF.CO API which converts the uploaded PDF document to JSON Meta.

The response returned by the API call is parsed as a JSON Object.

If the response contains an error, the error message is displayed on the console. In case of an exception, the exception is printed on the console.

If no exception occurs, and the response doesn’t contain any error, the URL of the converted JSON Meta file is extracted from the JSON object. The URL is then used by the “DownloadFile()” method of the WebClient class object to download the JSON file.

Here is a script that performs the aforementioned steps:

     try
           {

                response = webClient.UploadString(url, jsonPayload);

                json = JObject.Parse(response);

                if (json["error"].ToObject<bool>() == false)
                {

                string resultFileUrl = json["url"].ToString();

                Console.WriteLine(resultFileUrl);

                webClient.DownloadFile(resultFileUrl, DestinationFile);

                Console.WriteLine("Generated PDF file saved as \"{0}\" file.", DestinationFile);
                }
                else
                {
                Console.WriteLine(json["message"].ToString());
                }
           }
           catch (WebException e)
           {
                Console.WriteLine(e.ToString());
           }

Finally, the script below destroys the WebClient class object.

           webClient.Dispose();

           Console.WriteLine();
           Console.WriteLine("Press any key...");
           Console.ReadKey();

Complete Code for PDF to JSON Meta from Uploaded File

The complete code for converting an uploaded PDF file to JSON Meta is as follows:

using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

namespace ByteScoutWebApiExample
{
     class Program
     {

    const String API_KEY = "usmanmalik57@gmail.com_8df5671cedb0fb37ec9610eada409f110c38";
    const string SourceFile = @".\sample_doc.pdf";
    const string DestinationFile = @".\newDocument.json";
    const string Password = "";


    static void Main(string[] args)
    {

           WebClient webClient = new WebClient();
           webClient.Headers.Add("x-api-key", API_KEY);

           string query = Uri.EscapeUriString(string.Format(
                "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name={0}",
                Path.GetFileName(SourceFile)));

           string response = webClient.DownloadString(query);

           JObject json = JObject.Parse(response);

           string uploadUrl = json["presignedUrl"].ToString();
           string uploadedFileUrl = json["url"].ToString();


           webClient.Headers.Add("content-type", "application/octet-stream");
           webClient.UploadFile(uploadUrl, "PUT", SourceFile);
           webClient.Headers.Remove("content-type");

           string url = "https://api.pdf.co/v1/pdf/convert/to/json-meta";

      
           Dictionary<string, object> parameters = new Dictionary<string, object>();
           parameters.Add("name", Path.GetFileName(DestinationFile));
           parameters.Add("password", Password);
           parameters.Add("pages", "");
           parameters.Add("url", uploadedFileUrl);

           // Convert dictionary of params to JSON
           string jsonPayload = JsonConvert.SerializeObject(parameters);

           try
           {

                response = webClient.UploadString(url, jsonPayload);

                json = JObject.Parse(response);

                if (json["error"].ToObject<bool>() == false)
                {

                 string resultFileUrl = json["url"].ToString();

                Console.WriteLine(resultFileUrl);

                webClient.DownloadFile(resultFileUrl, DestinationFile);

                Console.WriteLine("Generated PDF file saved as \"{0}\" file.", DestinationFile);
                }
                else
                {
                Console.WriteLine(json["message"].ToString());
                }
           }
           catch (WebException e)
           {
                Console.WriteLine(e.ToString());
           }

           webClient.Dispose();

           Console.WriteLine();
           Console.WriteLine("Press any key...");
           Console.ReadKey();
    }
     }
}