How to Extract Hyperlinks in PDF with Python using PDF.co Web API
Python has a large set of libraries for handling different types of operations. To extract the data from PDF, we will use the PDF.co Web API.
In this article, We are going to extract hyperlinks from PDF in Python using PDF.co Web API
We have here a sample PDF and will extract the hyperlinks using Python
![Sample PDF with Hyperlinks](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F76456fd72cb262e745bd77f730df5f36b97aa248-999x748.png&w=2048&q=75)
Contents
Step 1: Install Request Module
- First, install the request module. Type
python -m pip install request
in your command line.
Step 2: Python Sample Code
- Next, let’s add the Python sample code in the Visual Studio Code Editor. You can also use your favorite editor in Python. Kindly click this link for the source code.
.
Step 3: PDF.co API Key
- Then, add the PDF.co API Key. You can get the API Key in your PDF.co dashboard.
![PDF.co API Key](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F8923a4128112e6eebd2badc49e5050fbb1030892-855x187.png&w=1920&q=75)
Step 4: Source File and Destination
- In line 12, input the source PDF file name.
- In line 18, type in your desired JSON output file name.
![Source File and Destination](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F540336480b4f12942f758d980877331d001c5119-905x267.png&w=1920&q=75)
Step 5: Custom Profiles
- In line 56, we will use a set advanced conversion profile
{ "OutputStructure": "OnlyLinks", "OutputTransformation": "$..text" }.
It will extract all links in a PDF.
![Advanced Conversion Profile](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F3935ea64d35442d8b90f3fa4b20a8d38bcfe1f3d-1118x146.png&w=3840&q=75)
Step 6: Run Program
- once the program runs successfully, check your program folder to view the output.
![Run Program](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F9ce58ec372eb48df64d3e21a4c6b47174a88276a-933x177.png&w=1920&q=75)
Step 7: Output
- Here are the extracted links in JSON format.
![Extracted Multiple Links](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F2niroq9z%2Fproduction%2F9f6d0f5dc1f74dcc1dc93c985da31cd32596ee3e-817x279.png&w=1920&q=75)
In this article, you learned how to extract hyperlinks from PDF in Python. You also learned how to use PDF.co Web API to extract multiple links from a PDF.