How to Create PDF from URL/HTML with Puppeteer
In this article, we’ll demonstrate how to create a PDF file from a website URL or HTML file.
We’re going to demonstrate two scenarios here: Converting a website URL to a PDF file, and converting HTML to a PDF file.
Now, we’ll be using Puppeteer for PDF generation. Puppeteer is a Node.js-based feature-rich library to work with the Chromium browser in memory itself.
Puppeteer has a built-in function to export a browser page to a PDF. Let’s review both scenarios here.
Convert Website URL to PDF
The following is Node.js source code to convert URL to PDF using Puppeteer.
const puppeteer = require("puppeteer");
(async () => {
// Create browser instance
const browser = await puppeteer.launch();
// Create a new page
const page = await browser.newPage();
// Website URL to export as pdf
const website_url = 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Keyed_collections';
// Open URL in current page
await page.goto(website_url, {
waitUntil: 'networkidle2'
});
// Save PDF File
await page.pdf({ path: './result_from_url.pdf', format: 'a4' });
// Close browser instance
await browser.close();
})();
Here, we have stored the browser URL in the website_url
variable. It’s used to navigate a browser page there.
// Open URL in current page
await page.goto(website_url, {
waitUntil: 'networkidle2'
});
Lastly, we’re exporting the current page to PDF using the page.pdf
method.
// Save PDF File
await page.pdf({ path: './result_from_url.pdf', format: 'a4' });
Convert HTML to PDF
Below is the source code for HTML file to PDF conversion using puppeteer in Node.js.
const puppeteer = require("puppeteer");
const fs = require("fs");
(async () => {
// Create browser instance
const browser = await puppeteer.launch();
// Create a new page
const page = await browser.newPage();
// Get HTML content
const html = fs.readFileSync('./sample.html', 'utf-8');
// Set HTML as page content
await page.setContent(html, { waitUntil: 'domcontentloaded' });
// Save PDF File
await page.pdf({ path: './result_from_html.pdf', format: 'a4' });
// Close browser instance
await browser.close();
})();
Now, we have a sample.html
file containing all HTML we want to export to PDF. First, we’re reading all content of the HTML file and storing it in the variable html
. Then, this variable is used to set page content. In other words, it’s similar to opening a local HTML file in a browser.
// Get HTML content
const html = fs.readFileSync('./sample.html', 'utf-8');
// Set HTML as page content
await page.setContent(html, { waitUntil: 'domcontentloaded' });
In the end, we’re exporting the browser page to PDF using the page.pdf
method.
// Save PDF File
await page.pdf({ path: './result_from_html.pdf', format: 'a4' });
More about the Puppeteer-based approach
Puppeteer is a great library to do browser automation tasks and website scraping. It’s easy to set up locally and has a feature-rich library.
Working with Puppeteer in a local development environment and deploying it to a production server are at separate levels of difficulty.
It can be tricky to set up in a production server, as its Node.js package has dependencies on a headless Chromium browser (which usually comes along with the package). In most cases, it requires a dedicated server with a good amount of CPU resources.
Additionally, since this is a Node.js based library, it can not be used directly with projects which are in other languages/environments.
Alternate PDF.co based approach
PDF.co provides REST-based endpoints to perform various PDF-related tasks. One of the features is HTML to PDF. We can either convert URL to PDF or HTML to PDF.
As PDF.co is a REST-based service, it can easily be consumed by any language/environment. And since it’s hosted on robust, secure servers, we don’t need to worry about expensive hosting.
These are the PDF.co endpoints for HTML to PDF generation.
- URL to PDF
- HTML to PDF
- PDF from HTML Template (PDF Report Generation based on HTML)
Let’s take a look at the CURL example on the URL to PDF using PDF.co.
curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/from/url' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "https://wikipedia.org/wiki/Wikipedia:Contact_us",
"async": false,
}'
In this article, we’ve discussed various ways to generate PDF from HTML/URL. Feel free to try these samples on your own machine. Thank you!