How to Convert PDF to CSV from File (Node for PDF to CSV API in JavaScript with PDF.co Web API

What is PDF.co Web API? It is the Rest API that provides a set of data extraction functions, tools for documents manipulation, splitting and merging of PDF files. It includes the built-in OCR, images recognition, can generate and read barcodes from images, scans and PDF.

In this article, we’ll be observing how to convert PDF data to CSV format with the help of PDF.co. We’ll be writing the code using NodeJs with the code snippet is given in this article. You can also get the full code snippet from PDF.co GitHub repository at this location. Also in order to get code snippet in other languages such as C#, Java, Python, etc; please visit this link.

On-demand (REST Web API) version:
 Web API (on-demand version)

On-premise offline SDK for Windows:
 60 Day Free Trial (on-premise)

Let’s review source code and it’s output first, then we’ll analyze code briefly.

Source Code

app.js

      
/*jshint esversion: 6 */ var https = require("https"); var path = require("path"); var fs = require("fs"); // `request` module is required for file upload. // Use "npm install request" command to install. var request = require("request"); // The authentication key (API Key). // Get your own by registering at https://app.pdf.co/documentation/api const API_KEY = "**************************************"; // Source PDF file const SourceFile = "./sample.pdf"; // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. const Pages = ""; // PDF document password. Leave empty for unprotected documents. const Password = ""; // Destination CSV file name const DestinationFile = "./result.csv"; // Prepare URL for `PDF To CSV` API call var query = `https://api.pdf.co/v1/pdf/convert/to/csv`; let reqOptions = { uri: query, headers: { "x-api-key": API_KEY }, formData: { name: path.basename(DestinationFile), password: Password, pages: Pages, file: fs.createReadStream(SourceFile) } }; // Send request request.post(reqOptions, function (error, response, body) { if (error) { return console.error("Error: ", error); } // Parse JSON response let data = JSON.parse(body); if (data.error == false) { // Download CSV file var file = fs.createWriteStream(DestinationFile); https.get(data.url, (response2) => { response2.pipe(file) .on("close", () => { console.log(`Generated CSV file saved as "${DestinationFile}" file.`); }); }); } else { // Service reported error console.log("Error: " + data.message); } });

package.json

      
{ "name": "test", "version": "1.0.0", "description": "PDF.co", "main": "app.js", "scripts": { }, "keywords": [ "pdf.co", "web", "api", "bytescout", "api" ], "author": "ByteScout & PDF.co", "license": "ISC", "dependencies": { "request": "^2.88.2" } }

Output

Now that we’ve seen the source code and its output – let’s analyze the source code.

Initially we’re referencing and importing all the necessary node modules such as https, path, file to make the API call and to download the file. We’re also doing necessary declarations for PDF.co API key, the path to the source PDF file, the number of pages to be processed, the destination file path, etc.

In order to convert PDF to CSV we’re using the PDF.co API endpoint /v1/pdf/convert/to/csv. While creating the request data, we’re passing the PDF.co API key in the header key named x-api-key; this is used for the authentication request. We’re also passing parameters such as the input PDF file (file), the expected output file name (name), the pages to be converted from PDF to CSV (pages), etc in the form data format. As you have noticed, we’re converting the source PDF file to stream and passing the stream as input data using createReadStream method.

Now that request parameters are ready, we’re executing the request with request.post command. The response data is parsed to JSON and we’re downloading the output CSV file from the response URL (data.url).

In this sample, we’re executing PDF.co PDF to CSV with very basic input options. We can finetune this request as per our requirement and get a customized output. Please visit PDF.co documentations for more information regarding available parameters.

To get more from this article, please try this sample in your machine. Thank you for reading!

VIDEO

ON-PREMISE OFFLINE SDK

Get 60 Day Free Trial

See also:

ON-DEMAND REST WEB API

Get Your API Key

See also: