Table of Contents
What is Expression?
Regular Expression and Grouping Constructs
List of Special Functions
How to Extract Text using Search-based Field and Macros?
How to Extract Text using Search-based Field, Macros and Regex?
How to Extract Text between Two Points?
How to Extract Text using Rectangle Field, Macros and Regex?
How to Use the Special Functions?
How to Extract Key Value pair?
How to Use Table Auto Detection?
Regular Expression Video Tutorials
What is Expression?
The Expression property of template fields takes expression that parses the desired data. This can be used in a Search-based field that extracts from the entire document or from a Rectangle field that extracts the text inside a Rectangle object.
The Expression can contain our custom Macros, pure Regular Expressions, a combination of both, and the name of Special Functions.
Regular Expressions a.k.a Regex is a special programming language designed for parsing information from strings. Click here to read the quick reference.
Our Macros are composed of complex regular expressions inside. We created them to help simplify the writing of expressions. They are available inside the Document Parser Template Editor.
Regular Expression and Grouping Constructs
By default, expression is assumed as simple text (or regex: false
in the field properties in the template text). In this case, symbols \.-+?*+|$^()[]{}
– the special regex symbols – will be matched exactly.
If we are writing Regex or the combined expression, we should put a check mark on the Regex checkbox in the field properties ( or regex: true
in the template text). In this case, those special symbols will be treated as Regex constructs/commands.
The Regexes and the combined expressions can contain Grouping Constructs. They are used to add some logic to expressions.
If the expression contains multiple groups, then, only the value of the last group will go to the result. If we need to explicitly specify which group value should go to the result, this group must be named as value. Such a named group looks as follows:
List of Special Functions
$$funcFindCompany
– searches for the company name. The Document Parser contains a large embedded list of existing companies. If company name from this list is not found, the function tries to detect company name by typical postfix such as “Ltd.”, “Inc.”, “Corp.”, etc.$$funcFindCompanyNext
– searches for another company name after the position of the first found company name.$$funcFindMaxNumber
– finds maximal number in the text that matches the{{Money}}
macro (decimal number with optional currency symbol).$$funcFindMaxDate
– finds maximal date in the text that matches the{{SmartDate}}
macro (dates in the most common national formats).$$funcFindMinDate
– finds minimal date in the text.
How to Extract Text using Search-based Field and Macros?
1. In this demonstration, we will find Bill To: Lanny Lane Ltd. To get started, click on Add Object button in the upper left corner and select FIELD based on search.
NOTE: You can follow using the sample documents in the Template Editor. Click on the Load Sample Template button to choose a sample document.
2. Under the Selected Object Properties, type Bill To:
in the Expression field and click on Insert Macro button.
3. Next, select Spaces. The engine will now look for the word Bill To: followed by one or more spaces.
4. Click on the Insert Macro button again and select SentenceWithSingleSpaces. This time the engine will look for the pattern Bill To: followed by one or more spaces which is followed by words and symbols with single spaces in between.
5. The Expression field now contains Bill To:{{Spaces}}{{SentenceWithSingleSpaces}}
. Kindly click on the Test Object button and it will display the result Bill To: Lanny Lane Ltd.
How to Extract Text using Search-based Field, Macros, and Regex?
In the tutorial above, we searched for Bill To: Lanny Lane Ltd.. In this next tutorial, we will limit the result to Lanny Lane Ltd. only. All the steps will still be the same except for the added Regex step.
Your Expression field should contain Bill To:{{Spaces}}{{SentenceWithSingleSpaces}}
. We will wrap the {{SentenceWithSingleSpaces}}
Macro inside the parentheses ()
and it should now look like this Bill To:{{Spaces}}({{SentenceWithSingleSpaces}})
.
Everything inside the parentheses will be captured and displayed as a result when the Regex is enabled. Kindly put a check mark in the Regex checkbox and run the Test Object. The result should now show Lanny Lane Ltd.
How to Extract Text between Two Points?
1. We will now move on to a more complex scenario. In this tutorial, we will extract the text between Bill To: and 465-223-5331. We will still use the Search-based field to perform this task.
Kindly click on Add Object button in the upper left corner and select Field based on search. On the Expression field, type Bill To: and click on Insert Macro button.
2. Kindly select the Spaces Macro and click on the Insert Macro button again.
3. Next, select the Anything Macro. The Anything Macro will capture all the characters in its way until it finds the next Macro in the expression.
4. In the Expression field, we will wrap the {{Anything}}
Macro inside the parentheses ()
symbol and add the name Value. The expression will now look like the image below:
5. Then, we will add the USPhoneNumber Macro which is our stopping point.
6. Kindly tick the Regex checkbox and run the Test Object button. The result will show Lanny Lane Ltd. Phone: – the text between the two points Bill To: and 465-223-5331.
How to Extract Text using Rectangle Field, Macros, and Regex?
1. In this tutorial we will show you how you can use the expression in a Rectangle field. We will extract the text inside the Rectangle field except for the word BILL TO, in the Invoice.
To begin, kindly click on the Add Object button in the upper left corner and select FIELD from Rectangle.
2. Kindly drag the Rectangle object over to the text and click on the Insert Macro button.
3. Next, select the ToggleToSingleLineMode Macro. This will allow the AnythingGreedy Macro to go over multiple lines.
4. In the Expression field, enter the word BILL TO and click on Insert Macro button again.
5. Then, select the AnythingGreedy Macro. The AnythingGreedy Macro will read all the characters in its way until the next Macro or until the end of the line. But because we used the ToggleToSingeLineMode Macro the AnythingGreedy will only stop until the end the of Rectangle field.
6. Now, we will wrap the {{AnythingGreedy}}
Macro inside the parentheses ()
and add the group name Value to return all the captured characters and symbols. The Expression should now look like the image below.
7. Kindly tick the Regex checkbox and run the Test Object. The result will show the text inside the Rectangle field after the word BILL TO.
How to Use the Special Functions?
1. In this tutorial we will show you how to use the Special Functions to find the Company Names in a document. To get started, click on the Add Object button and select Field based on search.
2. In the Expression field, kindly enter $$funcFindCompany
to find the first Company Name and change the object name to Company1 to easily distinguish it from the other objects.
3. Kindly add another Search-based field object and enter $$funcFindCompanyNext
to find the next Company Name.
4. Now, click on the Run Template button and you will see the two Company Names in the result.
How to Extract Key-Value pair?
1. In this tutorial, we will show you how to extract key-value pairs from an Echocardiogram Report. To begin, click on +Add Object button and select Add FIELD from KEY VALUE extraction.
2. The Expression will automatically be filled with the key and value macros.
3. Now, click on the Run Template button and it will extract the PDF text into key-value pair.
How to Use Table Auto Detection?
1. In this tutorial, we will extract the table items using the Add TABLE from TABLE AUTO DETECTION object.
2. After you add the object, it will auto-fill the JSON for finding the table.
3. Now, Run the template and it will return the table line items.
Regular Expression Video Tutorials
To dive deeper into Regular Expression, we have an easy-to-follow Regular Expression for Beginners Video Series that you can watch here.
In the videos, you’ll learn how to use pure Regular Expression, tools to practice Regex, some of our tools where you can apply and practice Regex, and real-life examples to create patterns and parse documents using Regex.
In this tutorial, we learned how to extract specific text in a PDF using Macros, Regex, and Special Functions. We familiarized ourselves with the different expressions and their specific functions.
NOTE: To learn how to create a template using the Template Editor, check out this tutorial.