Skip to main content

Headless Chrome and the Puppeteer Library for Scraping and Testing the Web

With the advent of Single Page Applications, scraping pages for information as well as running automated user interaction tests has become much harder due to its highly dynamic nature. The solution? Headless Chrome and the Puppeteer library.
While there's always been Selenium, PhantomJS and others, and despite headless Chrome and Puppeteer arriving late to the party, they make for valuable additions to the team of web testing automation tools, which allow developers to simulate interaction of real users with a web site or application.
Headless Chrome is able to run without Puppeteer, as it can be programmatically controlled through the Chrome DevTools Protocol, typically invoked by attaching to a remotely running Chrome instance:
chrome --headless --disable-gpu
                     --remote-debugging-port=9222
Subsequently loading the protocol's sideckick module 'chrome-remote-interface' which provides  a simple abstraction of commands and notifications using a straightforward JavaScript API, one can execute  JavaScript scripts under a local Node.js installation.
From the official documentation, here is an  example that navigates to https://example.com and saves a screenshot as example.png::
const puppeteer = require('puppeteer');
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  await page.screenshot({path: 'example.png'});
  await browser.close();
})();
But since there's 'chrome-remote-interface' already, what does Puppeteer do differently? Puppeteer offers a higher level API to the CDP than the one made available by 'chrome-remote-interface'.

Comments

Popular posts from this blog

Book Review : How To Create Pragmatic, Lightweight Languages

At last, a guide that makes creating a language with its associated baggage of lexers, parsers and compilers, accessible to mere mortals, rather to a group of a few hardcore eclectics as it stood until now.

The first thing that catches the eye, is the subtitle:

The unix philosophy applied to language design, for GPLs and DSLs"
What is meant by "unix philosophy" ?. It's taking simple, high quality components and combining them together in smart ways to obtain a complex result; the exact approach the book adopts.
I'm getting ahead here, but a first sample of this philosophy becomes apparent at the beginnings of Chapter 5 where the Parser treats and calls the Lexer like  unix's pipes as in lexer|parser. Until the end of the book, this pipeline is going to become larger, like a chain, due to the amount of components that end up interacting together.

The book opens by putting things into perspective in Chapter 1: Motivation: why do you want to build lan…

Deep Angel-The AI of Future Media Manipulation

Undeniably, we live in the era of media manipulation. Such powerful and accessible tools exist today that nearly everyone can do it. Now add to this collection Deep Angel, an artificial intelligence that can erase objects from photographs and videos.

I was notified of Deep Angel around the time I was watching Kill Switch, a futuristic and dystopic movie about our Earth getting cloned in order to suck the resources of the cloned planet, something that would sustain our world's energy needs for at least another millennium. To cut a long story short... full article on i-programmer.info

SAP's Creating Trustworthy and Ethical Artificial Intelligence

With the ink hardly dry on the pages of the EU Ethical AI Guidelines manifest, a free online course exploring the issues they raise is already in prospect on the openSAP platform. Run by members of the very same group, the European Union’s High-Level Expert Group on Artificial Intelligence, who wrote the guidelines and in cooperation with SAP's online education platform, a course with the titleCreating Trustworthy and Ethical Artificial Intelligence has been made accessible to anyone with an interest on AI or ML:
full article on i-programmer.info