How to Extract Text from PDF and Paste in Excel using Python and PDF.co Web API
In this tutorial, we will walk you through the process of extracting text from a PDF document and saving it to an Excel file using Python and the PDF.co Web API. By following this step-by-step guide, you’ll learn how to easily convert data from PDF files into an Excel format.
For this demonstration, we will use a sample PDF document and show you how to convert it into an Excel file.
Step 1: Install the requests Library
Before we begin, make sure that the requests
library is installed in your Python environment. This library is essential for making HTTP requests to the PDF.co Web API.
- Open your terminal or command line interface (CLI).
- Run the following command to install the
requests
module:python -m pip install requests
Step 2: Access the Source Code
Next, prepare the Python script that will handle the conversion of the PDF file into Excel format.
- Copy the sample Python code from the link provided.
- Paste the code into your preferred Python code editor, such as Visual Studio Code, PyCharm, or any other Python-compatible editor.
Step 3: Configure the Python Code
With the sample code in hand, let’s configure it to suit your specific settings.
API Key:
- Obtain your API Key from your PDF.co dashboard.
- Insert your API key into the designated section in the Python script.
Source File:
- Provide the name of the PDF file from which you want to extract data and convert it into an Excel file.
Output Excel Name:
- Specify the name of the output Excel file where the extracted data will be saved.
Asynchronous Mode:
- For greater efficiency, we recommend using Asynchronous Mode. This will allow the conversion process to run in the background, enabling the program to continue executing without waiting for the conversion to finish.
Step 4: Save Python Program
Once you’ve updated the script with your settings, save the Python program to your preferred directory.
Step 5: Run the Program
Now it’s time to run the program.
- Execute the Python script. If everything is set up correctly, the script will initiate the extraction process, and you'll receive a generated Excel file containing the extracted data.
Step 6: View the Extracted Data in Excel Format
Once the script has finished running, you can access the output Excel file.
- Navigate to the directory where the Python script is saved.
- The extracted data from your PDF document will now be available in Excel format.
In this tutorial, you learned how to extract PDF contents and paste them into Excel using Python. You learned to use the PDF.co PDF Extractor Web API to easily convert a PDF to Excel. You also learned how to get started with PDF Extractor Web API right away using the Python sample code.