Table of Contents

 
What is Expression?
Regular Expression and Grouping Constructs
List of Special Functions
How to Extract Text using Search-based Field and Macros?
How to Extract Text using Search-based Field, Macros and Regex?
How to Extract Text between Two Points?
How to Extract Text using Rectangle Field, Macros and Regex?
How to use the Special Functions?
Regular Expression Video Tutorials
 

What is Expression?

 
The Expression property of template fields takes expression that parses the desired data. This can be used in a Search-based field that extracts from the entire document or from a Rectangle field that extracts the text inside a Rectangle object.

The Expression can contain our custom Macros, pure Regular Expressions, a combination of both, and the name of Special Functions.

Regular Expressions a.k.a Regex is a special programming language designed for parsing information from strings. Click here to read the quick reference.

Our Macros are composed of complex regular expressions inside. We created them to help simplify the writing of expressions. They are available inside the Document Parser Template Editor.

 

Regular Expression and Grouping Constructs

 
By default, expression is assumed as simple text (or regex: false in the field properties in the template text). In this case, symbols \.-+?*+|$^()[]{} – the special regex symbols – will be matched exactly.

If we are writing Regex or the combined expression, we should put a check mark on the Regex checkbox in the field properties ( or regex: true in the template text). In this case, those special symbols will be treated as Regex constructs/commands.

The Regexes and the combined expressions can contain Grouping Constructs. They are used to add some logic to expressions.

If the expression contains multiple groups, then, only the value of the last group will go to the result. If we need to explicitly specify which group value should go to the result, this group must be named as value. Such a named group looks as follows:

Name A Group Value To Get Only The Result

 

List of Special Functions

 

  • $$funcFindCompany – searches for the company name. The Document Parser contains a large embedded list of existing companies. If company name from this list is not found, the function tries to detect company name by typical postfix such as “Ltd.”, “Inc.”, “Corp.”, etc.
  • $$funcFindCompanyNext – searches for another company name after the position of the first found company name.
  • $$funcFindMaxNumber – finds maximal number in the text that matches the {{Money}} macro (decimal number with optional currency symbol).
  • $$funcFindMaxDate – finds maximal date in the text that matches the {{SmartDate}} macro (dates in the most common national formats).
  • $$funcFindMinDate – finds minimal date in the text.

 

How to Extract Text using Search-based Field and Macros?

 

1. In this demonstration, we will find Bill To: Lanny Lane Ltd. To get started, click on Add Object button on the upper left corner and select FIELD based on search.
  
Add Field Based On Search Object
  
NOTE: You can follow using the sample documents in the Template Editor. Click on the Load Sample Template button to choose a sample document.

2. Under the Selected Object Properties, type Bill To: in the Expression field and click on Insert Macro button.
  
Enter The Text To Search In The Expression Field
  
3. Next, select Spaces. The engine will now look for the word Bill To: followed by one or more spaces.
  
Select Spaces Smart Macro To Add One Or More Spaces
  
4. Click on the Insert Macro button again and select SentenceWithSingleSpaces. This time the engine will look for the pattern Bill To: followed by one or more spaces which is followed by words and symbols with single spaces in between.
  
Select SentenceWithSingleSpaces Smart Macro
  
5. The Expression field now contains Bill To:{{Spaces}}{{SentenceWithSingleSpaces}}. Kindly click on the Test Object button and it will display the result Bill To: Lanny Lane Ltd.
  
Run Test Object To See The Expression Result
  

How to Extract Text using Search-based Field, Macros and Regex?

  
In the tutorial above, we searched for Bill To: Lanny Lane Ltd.. In this next tutorial, we will limit the result to Lanny Lane Ltd. only. All the steps will still be the same except for the added Regex step.

Your Expression field should contain Bill To:{{Spaces}}{{SentenceWithSingleSpaces}}. We will wrap the {{SentenceWithSingleSpaces}} Macro inside the parentheses () and it should now look like this Bill To:{{Spaces}}({{SentenceWithSingleSpaces}}).

Everything inside the parentheses will be captured and displayed as a result when the Regex is enabled. Kindly put a check mark in the Regex checkbox and run the Test Object. The result should now show Lanny Lane Ltd.
  
Extract Text Value Using Smart Macros and Regex
 
 

How to Extract Text between Two Points?

  

1. We will now move on to a more complex scenario. In this tutorial we will extract the text between Bill To: and 465-223-5331. We will still use the Search-based field to perform this task.

Kindly click on Add Object button on the upper left corner and select Field based on search. On the Expression field, type Bill To: and click on Insert Macro button.
  
Extract Text Between Two Points Using Field Based Search
  
2. Kindly select the Spaces Macro and click on the Insert Macro button again.
  
Insert Spaces Macro After Bill To
  
3. Next, select the Anything Macro. The Anything Macro will capture all the characters in its way until it finds the next Macro in the expression.
  
Insert The Anything Macro To Capture All Characters Up To The Next Macro
  
4. In the Expression field, we will wrap the {{Anything}} Macro inside the parentheses () symbol and add the name Value. The expression will now look like the image below:
  
Add The Name Value To The Captured Group
  
5. Then, we will add the USPhoneNumber Macro which is our stopping point.
  
Insert US Phone Number Macro As The End Macro
  
6. Kindly tick the Regex checkbox and run the Test Object button. The result will show Lanny Lane Ltd.   Phone: – the text between the two points Bill To: and 465-223-5331.
  
Enable Regex To Display Captured Group
 
 

How to Extract Text using Rectangle Field, Macros and Regex?

 
 
1. In this tutorial we will show you how you can use the expression in a Rectangle field. In the Invoice we will extract the text inside the Rectangle field except the word BILL TO.

To begin, kindly click on the Add Object button on the upper left corner and select FIELD from Rectangle.

Add Field From Rectangle Object To Extract Text

2. Kindly drag the Rectangle object over to the text and click on the Insert Macro button.

Drag Rectangle Object Over To Text And Insert Macro

3. Next, select the ToggleToSingleLineMode Macro. This will allow the AnythingGreedy Macro to go over multiple lines.

Select ToggleSingleLineMode Macro To Search Multi-Line Text

4. In the Expression field, enter the word BILL TO and click on Insert Macro button again.

Enter BILL TO In The Expression Field And Insert Macro

5. Then, select the AnythingGreedy Macro. The AnythingGreedy Macro will read all the characters in its way until the next Macro or until the end of the line. But because we used the ToggleToSingeLineMode Macro the AnythingGreedy will only stop until the end the of Rectangle field.

Select AnythingGreedy Macro To Capture All Characters Up To The Next Macro Or End Of The Line

6. Now, we will wrap the {{AnythingGreedy}} Macro inside the parentheses () and add the group name Value to return all the captured characters and symbols. The Expression should now look like the image below.

Add The Group Name Value To Captured Group

7. Kindly tick the Regex checkbox and run the Test Object. The result will show the text inside the Rectangle field after the word BILL TO.

Enabled Regex And Run Test Object To See The Result

 

How to use the Special Functions?

 

1. In this tutorial we will show you how to use the Special Functions to find the Company Names in a document. To get started, click on the Add Object button and select Field based on search.

Add Field Based On Search Object To Use Special Function

2. In the Expression field, kindly enter $$funcFindCompany to find the first Company Name and change the object name to Company1 to easily distinguish it from the other objects.

Use The $$funcFindCompany To Find The First Company

3. Kindly add another Search-based field object and enter $$funcFindCompanyNext to find the next Company Name.

Use $$funcFindCompanyNext To Find The Next Company Name

4. Now, click on the Run Template button and you will see the two Company Names in the result.

Run Template To See The Result

Regular Expression Video Tutorials

To dive deeper in Regular Expression, we have an easy-to-follow Regular Expression for Beginners Video Series that you can watch here.

In the videos you’ll learn how to use pure Regular Expression, tools to practice Regex, some of our tools where you can apply and practice Regex, and real life examples to create patterns and parse documents using Regex.

In this tutorial we learned how to extract specific text in a PDF using Macros, Regex, and Special Functions. We familiarized ourselves with the different expressions and their specific functions.

NOTE: To learn how to create a template using the Template Editor, check out this tutorial.

Related Pages:

Related Samples: