How to Extract Images From PDF in Python and Convert Them Back to PDF

In this tutorial, we’re going to show you how to extract images from PDF in Python and convert them back to PDF.

Step 1: Source Code

Create a Python file and add the following source code:

import requests # pip install requests

# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co
API_KEY = "***************************************"

# Base URL for PDF.co Web API requests
BASE_URL = "https://api.pdf.co/v1"

# Direct URL of source PDF file.
# You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/    
SourceFileURL = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-image/sample.pdf"
# Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
Pages = ""
# PDF document password. Leave empty for unprotected documents.
Password = ""

def main(args = None):
    convertPdfToImage(SourceFileURL)


def convertPdfToImage(sourceFileURL):
    """Converts PDF To Image using PDF.co Web API"""

    # Prepare requests params as JSON
    # See documentation: https://developer.pdf.co
    parameters = {}
    parameters["password"] = Password
    parameters["pages"] = Pages
    parameters["url"] = sourceFileURL

    # Prepare URL for 'PDF To PNG' API request
    url = "{}/pdf/convert/to/png".format(BASE_URL)

    # Execute request and get response as JSON
    response = requests.post(url, data=parameters, headers={ "x-api-key": API_KEY })
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:

            # Download generated PNG files
            part = 1

            for resultFileUrl in json["urls"]:
                # Download Result File
                r = requests.get(resultFileUrl, stream=True)

                localFileUrl = f"Page{part}.png"

                if r.status_code == 200:
                    with open(localFileUrl, 'wb') as file:
                        for chunk in r:
                            file.write(chunk)
                    print(f"Result file saved as \"{localFileUrl}\" file.")
                else:
                    print(f"Request error: {response.status_code} {response.reason}")

                part = part + 1
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")


if __name__ == '__main__':
    main()

Step 2: Install request module

  • Have installed the request module already? If not, kindly install the requests module. You can do it by running on your command line:
python -m pip install requests 

Step 3: Add API Key

  • On line 5 Insert your API key into your Python code. You may get your API key from your dashboard.

Step 4: URL to Source and Path to Output File

  • Add the URL to your source and path for your output file on line 12 with SourceFileURL

Step 5: Setup Parameters

  • Set up the parameters for your PDF to PNG conversion
  • On lines 27-30 define any parameters you may need for passwords etc.

Step 6: Run Program

  • Once you’re done with your code, you may now run your program. You can see the output file in your folder
Resulting files
Resulting files

Step 7: How to Convert Them Back into PDF

Follow these steps to convert images back to PDF in Python.

URL to Source and Path to Output

  • Add the URL to your source and path for your output file

Add URLs to the image files
Add URLs to the image files

Setup Parameters

  • Set up the parameters for your PNG to PDF conversion
Set required parameters
Set required parameters

Run Program

  • Once you’re done with your code, you may now run your program. You can see the output file in your folder
PDF created from image files
PDF created from image files

In this tutorial you’ve learned how to convert PDF to PNG and convert the PNG back to PDF in Python with PDF Extractor API.