How to Create PDF from URL/HTML with Puppeteer

In this article, we’ll demonstrate how to create a PDF file from a website URL or HTML file.

We’re going to demonstrate two scenarios here: Converting a website URL to a PDF file, and converting HTML to a PDF file.

Now, we’ll be using Puppeteer for PDF generation. Puppeteer is a Node.js-based feature-rich library to work with the Chromium browser in memory itself.

Puppeteer has a built-in function to export a browser page to a PDF. Let’s review both scenarios here.

Convert Website URL to PDF

The following is Node.js source code to convert URL to PDF using Puppeteer.

const puppeteer = require("puppeteer");
 
(async () => {
 
    // Create browser instance
    const browser = await puppeteer.launch();
 
    // Create a new page
    const page = await browser.newPage();
 
    // Website URL to export as pdf
    const website_url = 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Keyed_collections';
 
    // Open URL in current page
    await page.goto(website_url, {
        waitUntil: 'networkidle2'
    });
 
    // Save PDF File
    await page.pdf({ path: './result_from_url.pdf', format: 'a4' });
 
    // Close browser instance
    await browser.close();
})();

Here, we have stored the browser URL in the website_url variable. It’s used to navigate a browser page there.

 // Open URL in current page
    await page.goto(website_url, {
        waitUntil: 'networkidle2'
    });

Lastly, we’re exporting the current page to PDF using the page.pdf method.

// Save PDF File
await page.pdf({ path: './result_from_url.pdf', format: 'a4' });

Convert HTML to PDF

Below is the source code for HTML file to PDF conversion using puppeteer in Node.js.

const puppeteer = require("puppeteer");
const fs = require("fs");
 
(async () => {
 
    // Create browser instance
    const browser = await puppeteer.launch();
 
    // Create a new page
    const page = await browser.newPage();
 
    // Get HTML content
    const html = fs.readFileSync('./sample.html', 'utf-8');
 
    // Set HTML as page content
    await page.setContent(html, { waitUntil: 'domcontentloaded' });
 
    // Save PDF File
    await page.pdf({ path: './result_from_html.pdf', format: 'a4' });
 
    // Close browser instance
    await browser.close();
})();

Now, we have a sample.html file containing all HTML we want to export to PDF. First, we’re reading all content of the HTML file and storing it in the variable html. Then, this variable is used to set page content. In other words, it’s similar to opening a local HTML file in a browser.

// Get HTML content
    const html = fs.readFileSync('./sample.html', 'utf-8');
 
    // Set HTML as page content
    await page.setContent(html, { waitUntil: 'domcontentloaded' });

In the end, we’re exporting the browser page to PDF using the page.pdf method.

// Save PDF File
await page.pdf({ path: './result_from_html.pdf', format: 'a4' });


More about the Puppeteer-based approach

Puppeteer is a great library to do browser automation tasks and website scraping. It’s easy to set up locally and has a feature-rich library.

Working with Puppeteer in a local development environment and deploying it to a production server are at separate levels of difficulty.

It can be tricky to set up in a production server, as its Node.js package has dependencies on a headless Chromium browser (which usually comes along with the package). In most cases, it requires a dedicated server with a good amount of CPU resources.

Additionally, since this is a Node.js based library, it can not be used directly with projects which are in other languages/environments.

Alternate PDF.co based approach

PDF.co provides REST-based endpoints to perform various PDF-related tasks. One of the features is HTML to PDF. We can either convert URL to PDF or HTML to PDF.

As PDF.co is a REST-based service, it can easily be consumed by any language/environment. And since it’s hosted on robust, secure servers, we don’t need to worry about expensive hosting.

These are the PDF.co endpoints for HTML to PDF generation.

  1. URL to PDF
  2. HTML to PDF
  3. PDF from HTML Template (PDF Report Generation based on HTML)

Let’s take a look at the CURL example on the URL to PDF using PDF.co.

curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/from/url' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
    "url": "https://wikipedia.org/wiki/Wikipedia:Contact_us",
    "async": false,
}'

In this article, we’ve discussed various ways to generate PDF from HTML/URL. Feel free to try these samples on your own machine. Thank you!