Extract an Article from a Newspaper in Python using PDF.co Web API

About Digital Newspapers

In simple terms, digital newspapers translate to newspapers in electronic form. It provides subscribers and users access to news via digital devices and platforms. A digital newspaper can operate as a standalone electronic news outlet. However, most top news platforms have both physical and digital outlets of their service.

In the past, many traditional newspapers were reluctant to create digital versions of their publications. However, the exponential increase in the use of the internet prompted most news outlets to create online platforms. This transition brought about several opportunities and advantages. Now, newspapers are competing to present the latest news to readers. In addition, digital advertising opportunities have appeared. Therefore, newspapers are exploring various opportunities for making extra income. Similarly, newspaper owners can reach the younger generation who are more inclined towards digital platforms. Likewise, the cost of printing physical newspapers can be eliminated.

Types of Digital Newspapers

Generally speaking, there are two types of digital newspapers. These types of online newspapers are:

  • Web-based Newspaper
  • Electronic Newspaper

Web-based Newspaper

As the name implies, a web-based newspaper is different from a physical publication. It appears on a webpage of the newspaper’s website. When you land on the front page of a web newspaper, you see a combination of photographs, stories, and captions. In most cases, you will see the top stories of the day highlighted at the top of the page. Traditionally, most top newspapers use web-based platforms to present news.

Electronic Newspaper

Electronic Newspapers have the same format as physical newspapers. Therefore, with appropriate printing equipment and paper, an electronic newspaper can be printed into a physical newspaper. As a result, this type of paper contains all the information you will find on the physical print. In most cases, newspaper owners create electronic newspapers for subscribers that prefer to read publications on digital devices.

Extract Article from Newspaper – Step-by-step Guide

  1. Put Files in Folder
  2. Install the Requests Module
  3. Your API Key
  4. Source File and Text File Name
  5. Run Program
  6. Demo

In this demonstration, we will extract an article from a newspaper PDF in Python. We will use the /v1/pdf/convert/to/text endpoint to extract the article. Below is the sample newspaper and the extracted article output in TEXT format.

Sample PDF and Extracted Text from PDF
Sample PDF and Extracted Text from PDF

Step 1: Put Files in Folder

Let’s start by adding our files to the Python program folder. Then, you can download our Python code here.

Step 2: Install the Requests Module

Next, if you haven’t yet, kindly install the requests module. Type the python -m pip install requests in your command line and run to access the pip install requests.

Step 3: Your API Key

Now, open the Python sample code and proceed to line 6. Kindly put your PDF.co API Key in this line. You can get the PDF.co API Key in your PDF.co dashboard.

Your API Key

Step 4: Source File and Text File Name

In lines 12 and 18, enter the source PDF file and type your desired text file name.

Source PDF and PDF File Name

Step 5: Run Program

Let’s run the program and open the Python program folder to see the result.

Python Program Result

Step 6: Demo

Here’s the PDF Extractor Web API in action.

PDF.co PDF to Text Web API in Python
PDF.co PDF Extractor Web API in Python

In this tutorial, you learned how to extract an article from a newspaper PDF in Python using the PDF.co Web API. You learned how to use the PDF Extractor Web API to convert a PDF file to Text. You also learned how to install the requests module.