In this step-by guide, we’ll learn how to extract data from PDF in XML format using UiPath and Following are the steps we’ll be observing.

  1. Install package for UiPath
  2. Configure Scope
  3. Add PDF to XML Activity
  4. Demo Output


Install package for UiPath

UiPath Studio provides a nice NuGet-like experience in adding third-party packages. To install the package follow these steps.

Open the package window by clicking “Manage Packages” in the top ribbon menu.

Package Manager

On the Packages page, go to “Marketplace” and search for “”. You’ll immediately see the package named “”. Choose the latest version of the package and perform installation and save.

PDFco Package

With the completion of this simple step, you’ve successfully installed the package in your UiPath Studio. Now, you’ll find activities in the activity panel as shown in the following image.

PDFco Activities


Configure Scope

Adding and configuring Scope is very important for running services. scope provides a generic way to provide API key as shown in the following figure.

Add PDFco Scope

Please note that all other activities such as PDF Merge, PDF Split, PDF Filler, PDF to XML must be inside Scope. It is because scope contains API key and as other activities use API key from Scope, it must be within scope. API key is necessary for authentication of requests. You can get the API key upon signing up.


Add and Configure PDF to XML Activity

As the name suggests, the “PDF to XML” activity is useful to convert PDF to XML. Now, drag and drop PDF to XML activity inside Scope.

PDFtoXML Activity

For input PDF we can either give the URL of the PDF file (as shown image above), or we can give a physical path to the PDF file. In this case, input is in string format, hence as per UiPath constraints, it must be within a double quotation (“”).

“PDF to XML” activity consists of many properties to configure output as per our requirement. Let’s see them briefly.

PDFtoXML Activity Properties


Property Name Type Usage
Destination File Path Input Output physical path for storing resulting XML file.
Extraction Region Input If we only need to extract a particular portion of PDF as XML, we need to provide coordinates here. For example, “51, 114, 235, 204”.
Line Grouping Input To enable line grouping between table cells, enable this field.
Pages Input Configure this field with page numbers for restricting extraction from that page only.Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts at 0 (zero). To set a range use the dash -, for example 0,2-5,7-. To set a range from index to the last page use range like this: 2- (from page #3 as the index starts at zero and till the end of the document). For ALL pages just leave this param empty.
Password Input When the input PDF is password protected, provide the password here.
PDF File Path Input The physical path of input PDF file.
PDF File URL Input URL of input PDF file.
Profiles Input You can set additional and extra options using this parameter that allows you to
set custom configurations. See profile samples for more examples.
Unwrap Input Unwrap lines into a single line within table cells when lineGrouping is enabled.
IsSuccess Output This field contains the status of API calls. Whether it succeeded or not.
Message Output Response message if any
Output Text Output Output XML text
ResultFileUrl Output URL of the output XML result


Demo Output

In this sample, I’ve configured one PDF to XML activity within scope. Configuration wise we have the URL of the input PDF file as well as the destination path for output XML.

PDF to XML UiPath Demo

Above animation shows debugging it in action.

To get more out of this article, please practice this in your machine. Thank you for reading!

Related Pages:

Related Samples: