Here you will learn how to get specific text such as invoice numbers containing hundreds or thousands of documents. Document parser can go over these files and extract these texts for you. You can download or export the result in CSV, XML JSON or reuse the text to fill out another document.

Document parser works with documents such as Purchase Order, Airline Ticket, Lab Report, and Insurance Quote Request to name a few. Now let’s head over to the Document Parser Template Editor where we can tell the document parser what to do.

In the template editor, you can create a template for a PDF or an image file, edit an existing template and save the template. You can select any of these document parser objects and add template options. Let’s go ahead and open the sample Google Invoice.

Extract Text with Doc Parser

It’s a one-page document with company and client information, invoice details in a table item. To begin, let’s call this template ‘demo’. The template uses the English language by default. If you’re working with documents in other languages, such as French, click on ‘Template options’ and select fr-FR-French in culture. This will tell the agent to read the currency in the French format.

To get the client’s name, click on the ‘Add object’ button and select the ‘add Field from Rectangles selection’. This object is useful to extract text and fixed location. You can drag and drop the objects anywhere in a document. It can also be resized to fit the text. To see the result, you can either do a test object or run the whole template. Click on one template to see what the output looks like.

Extract with Doc Parser

You can rename objects in the object properties to easily identify them. To get the text with no fixed location, we’ll use the ‘add Field based on Text Search’. To demonstrate, let’s get the invoice number. The field-based on text search objects finds the exact text and pattern that you enter in the expression field. To find the invoice number, type the exact words invoice number: click on the insert macro’ button and select the spaces macro for an undefined number of spaces.

The invoice number is composed of digits and I only want to display these numbers in the result. To do that add an open parenthesis, question mark, open-angle bracket, value, close angle bracket and because they’re all digits, select the digits macro and close the parentheses (i.e Invoice number:{{Spaces}}(?<value>{{Digits}}). When you use a macro, always check the Regex checkbox. So the engine will know to look out for macros and special characters.

Regex stands for regular expression or Macros are made up of regular expressions. We made the macros to make it easier for you to put together a search pattern. The expression field can take both macros and regular expressions. Click on one template to see the result. The Add static field allows you to add text that’s not in the documents. You can add the company name or any other text that you want to include in the result.

Extract Text with Document Parser

Now, let’s move on to the table objects. Use the add table with rectangle selection to get table items with fixed locations. Use the add table field based on text search for non-fixed table and multipage table. To demonstrate, let’s use this two-page document with one table that continues on to the next page.

Grab the contents of the properties field and paste it into a text editor. First, let’s find a starting point in a document because they only want to display the table items. I’ll tell the engine to start parsing everything that follows quantity and extended price. Next, let’s find the ending point. I’ll tell the engine to stop parsing once it finds the page footer or the total.

Text Extraction with Doc Parser

This table has a very simple layout and contents, so I’ll remove the row. There are five columns in the table, so let’s add two more columns and rename each column and assign the correct data type to easily identify them. Because this is a multipage table, let’s set it to true after columns. Copy all of this and put it back in the properties field, click on Run Template to see the result.

Text Extraction with Document Parser

All of your saved templates are kept in a document parser page, you can use the ID to call the template in automation software such as Zapier and Integromat or download the JSON to use in your coded programs. We have loads of sample documents with ready-made templates that you can check out in the sample templates.

You can look for documents with the same layout as yours and edit the template to get your desired result. For additional information and this and other features simply click Support and Help or check out our API docs, source source code samples, join our Facebook group and interact with other members or contact our customer support team.