How to Extract Hyperlinks in PDF with Python using PDF.co Web API
Sep 2, 2024·3 Minutes Read
Python has a large set of libraries for handling different types of operations. To extract the data from PDF, we will use the PDF.co Web API.
In this article, We are going to extract hyperlinks from PDF in Python using PDF.co Web API
We have here a sample PDF and will extract the hyperlinks using Python
Step 1: Install Request Module
- First, install the request module. Type
python -m pip install request
in your command line.
Step 2: Python Sample Code
- Next, let’s add the Python sample code in the Visual Studio Code Editor. You can also use your favorite editor in Python. Kindly click this link for the source code.
.
Step 3: PDF.co API Key
- Then, add the PDF.co API Key. You can get the API Key in your PDF.co dashboard.
Step 4: Source File and Destination
- In line 12, input the source PDF file name.
- In line 18, type in your desired JSON output file name.
Step 5: Custom Profiles
- In line 56, we will use a set advanced conversion profile
{ "OutputStructure": "OnlyLinks", "OutputTransformation": "$..text" }.
It will extract all links in a PDF.
Step 6: Run Program
- once the program runs successfully, check your program folder to view the output.
Step 7: Output
- Here are the extracted links in JSON format.
In this article, you learned how to extract hyperlinks from PDF in Python. You also learned how to use PDF.co Web API to extract multiple links from a PDF.
Related Tutorials
The Best Way to Create PDF Files in Python using PDF.co Web API
Dec 27, 2024·3 Minutes Read
How to Add Text and Image to a PDF Using PDF.co Web API in Python
Dec 30, 2024·4 Minutes Read