How to Extract Mathematical Expression from PDF in Python using PDF.co Web API

Dec 30, 2024·6 Minutes Read

In this tutorial, we will guide you through the process of extracting mathematical expressions from PDF documents using Python and the PDF.co Web API. By following this step-by-step guide, you will learn how to easily extract mathematical expressions and other data from PDF files.


To demonstrate, we will use a sample PDF document and show how you can extract mathematical expressions using the Python requests library to interact with the PDF.co Web API.

Sample PDF

Step 1: Install the requests Library

Before we begin, make sure that the requests library is installed in your Python environment. This library is essential for making HTTP requests to the PDF.co Web API.

  • Open your terminal or command line interface (CLI).
  • Run the following command to install the requests module: python -m pip install requests

This command will install the requests library, enabling your Python code to interact with the PDF.co Web API.

Step 2: Access the Source Code

Next, you need to prepare the Python script that will handle the conversion of the email file into a PDF.

  • Copy the sample Python code from the link provided (or use the code example below).
  • Paste the code into your preferred Python code editor, such as Visual Studio Code, PyCharm, or any other Python-compatible editor.

Step 3: Configure the Python Code

Now that you have the sample code, let’s configure it with your specific settings.

API Key:

  • Obtain your API Key from your PDF.co dashboard.
  • Add your API key to the designated spot in the Python script.

Source File:

  • Enter the name of the PDF file from which you want to extract the mathematical expressions.

Output PDF Name:

  • Choose the desired name for the output file where the extracted data will be saved. This will usually be a JSON file.

Asynchronous Mode:

  • For efficiency, we recommend using Asynchronous Mode during the conversion process. This mode allows the conversion to run in the background, enabling the program to continue executing without waiting for the conversion to complete.

Template Name:

  • Specify the template name that will be used to parse and extract the mathematical expressions from the PDF. You can create a new template using the PDF.co Document Parser Template Editor. For a quick guide on creating a template, you can refer to this tutorial.
Configure the Python Code

Step 4: Save Python Program

Once you’ve updated the script with your API Key, PDF file name, and template, save the Python program in your preferred directory. Make sure to use a clear and descriptive file name.

Save Python Program

Step 5: Run the Program

Now it’s time to run the program.

  • Execute the Python script. If everything is set up correctly, the script will initiate the extraction process, and you'll receive a generated JSON file containing the extracted data.
Run the Program

Step 6: View the Extracted Data in JSON Format

After the script finishes running, you can view the output JSON file.

  • Navigate to the directory where the Python script is located.
  • Open the generated JSON file with your preferred JSON viewer or text editor.
  • You should now see the extracted mathematical expressions from your PDF document, organized in a structured JSON format.
View the Extracted Data in JSON Format

In this tutorial, you have learned how to extract mathematical expressions from PDF documents using the PDF.co Web API in Python.

Related Tutorials

See Related Tutorials