Extract Hyperlinks from a PDF Using Python and PDF.co Web API
In this guide, we will demonstrate how to extract hyperlinks from a PDF file using Python and the PDF.co Web API. We will work with a sample PDF document to retrieve the hyperlinks.
data:image/s3,"s3://crabby-images/fab10/fab10e71bb4cac6c78d55a8cc43ef6f2ce0f08bf" alt="Sample PDF File with HyperLinks"
Step 1: Install Pip Request
To begin, we need to install the requests
module, which will help us make HTTP requests to the PDF.co API. In your command line or terminal, type the following command and hit Enter to install the requests
library: python -m pip install requests
Step 2: Source Code Samples
Next, copy the Python sample code from this link. Then, paste the code into your editor (e.g., Visual Studio Code, PyCharm, or any editor of your choice).
Step 3: Configure the Python Code
Now, let's set up the Python code with your specific configurations:
- API Key: Insert your API Key in the designated area within the code. You can find your API Key on your PDF.co Dashboard.
- Source File:Specify the name of the PDF file from which you want to extract hyperlinks.
- Output File Name: Enter the desired name for the output JSON file.
- Profiles: We will use an advanced conversion profile with the following settings:
{ "OutputStructure": "OnlyLinks", "OutputTransformation": "$..text" }
.This configuration will extract all hyperlinks present in the PDF.
For this demonstration, we will use Asynchronous mode for conversion. This will allow us to process the conversion in the background, making the program more efficient.
data:image/s3,"s3://crabby-images/8619d/8619d13582f0063d756d3123bdb85775210da24b" alt="Configure the Python Code"
Step 4: Save the Python Program
Once you have configured the code settings, save the Python program in your preferred directory.
data:image/s3,"s3://crabby-images/32583/32583d14221f9883487fec9e03270270fd62ea99" alt="Save the Python Program"
Step 5: Execute the Program
After saving the program, execute the Python script. If everything is set up correctly, the program will successfully extract the hyperlinks from your PDF document. Once the execution is complete, navigate to your Python folder to find the generated JSON file.
data:image/s3,"s3://crabby-images/4c4f5/4c4f560525b0cc3a1e17096fd114ba078a4fbe2c" alt="Execute the Program"
Step 6: View JSON Result
Finally, open the output JSON file in your preferred JSON viewer. You will see the extracted hyperlinks neatly formatted in JSON.
data:image/s3,"s3://crabby-images/dc761/dc761042fd361a440534c521668a9e1f2a0a9750" alt="View JSON Result"
In this article, you learned how to extract hyperlinks from a PDF in Python with PDF.co Web API.
Related Tutorials
data:image/s3,"s3://crabby-images/708ab/708ab1fff1041b667446e8bda0ee2399b271ea6d" alt="Tutorial default thumbnail"
data:image/s3,"s3://crabby-images/708ab/708ab1fff1041b667446e8bda0ee2399b271ea6d" alt="Tutorial default thumbnail"
data:image/s3,"s3://crabby-images/708ab/708ab1fff1041b667446e8bda0ee2399b271ea6d" alt="Tutorial default thumbnail"