This tutorial will teach you how to extract a table from a PDF with NodeJS.
Below is the image of the source PDF invoice and the extracted table with text output in JSON format.
The following steps explain how to setup your environment, details the programming and explains how to run the program.
Step 1: Source Code and Template
To begin extracting tables from a PDF, open Visual Studio Code (or your favorite editor) and save the following files.
Note: You can also download a zip bundle from the page with the source code and template.
Step 2: Install Requests Module
- Navigate to where you downloaded the files or where you want to use the project and install the required
- To install the requests module, type
npm install requestsin your command line interface (CLI).
You will notice a new
node_modules folder has appeared - don't worry this is expected!
Step 3: Insert API Key
Step 4: Source and Destination File
15, add your source PDF file, then on line
19 type your desired output filename. Aside from JSON output, you can also extract tables with text in CSV and XML formats.
Note: On Mac & Linux systems filenames do not require the leading
./ before the filenames!
Step 5: Add Template
96, check the path to the template name. The Document Parser supports both JSON and YML template formats. For more details about the Document Parser, check out this page.
Note: On Mac & Linux systems the leading
./ before the filename is not required.
To run the program, simply type
node app.js in the command line interface (CLI).
At this point you should see the resulting output file (
result.json)with the table data extracted!
Use Cases - Extract Specific Data from Invoices
Here are a few use cases for the extraction of specific data from invoices:
Accounts Payable Automation
By extracting data such as invoice numbers, dates, vendor names, and amounts from scanned invoices, companies can streamline the process of paying bills, reducing errors and saving time.
Data Extraction from receipts and invoices for business expenses helps individuals to easily keep track of their spending and ensure that they are accurately recording expenses for tax and accounting purposes.
Many industries have strict regulations around financial reporting and record-keeping. By extracting data from scanned invoices, companies can ensure that they are complying with these regulations and avoid fines and penalties.
By extracting data from scanned invoices and other financial documents, companies can gain insights into their business performance. For example, they may be able to identify trends in spending, compare the profitability of different products or services, or assess the financial health of their business.