Skip to main content

Headless Chrome and the Puppeteer Library for Scraping and Testing the Web

With the advent of Single Page Applications, scraping pages for information as well as running automated user interaction tests has become much harder due to its highly dynamic nature. The solution? Headless Chrome and the Puppeteer library.
While there's always been Selenium, PhantomJS and others, and despite headless Chrome and Puppeteer arriving late to the party, they make for valuable additions to the team of web testing automation tools, which allow developers to simulate interaction of real users with a web site or application.
Headless Chrome is able to run without Puppeteer, as it can be programmatically controlled through the Chrome DevTools Protocol, typically invoked by attaching to a remotely running Chrome instance:
chrome --headless --disable-gpu
                     --remote-debugging-port=9222
Subsequently loading the protocol's sideckick module 'chrome-remote-interface' which provides  a simple abstraction of commands and notifications using a straightforward JavaScript API, one can execute  JavaScript scripts under a local Node.js installation.
From the official documentation, here is an  example that navigates to https://example.com and saves a screenshot as example.png::
const puppeteer = require('puppeteer');
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  await page.screenshot({path: 'example.png'});
  await browser.close();
})();
But since there's 'chrome-remote-interface' already, what does Puppeteer do differently? Puppeteer offers a higher level API to the CDP than the one made available by 'chrome-remote-interface'.

Comments

Popular posts from this blog

Insider's Guide To Udacity Android Developer Nanodegree Part 3 - Making the Baking App

Continuing to chart my experience of Udacity's Android Developer Nanodegree we step up in level, embarking on the advanced part of the super-course.
Completing project "Popular Movies" (see Part 2 of this series) signaled the end of "Android Developer". Now we are ready to tackle the second element of the program "Advanced Android Developer", a new class with a new syllabus and project. Continuing to chart my experience of Udacity's Android Developer Nanodegree we step up in level, embarking on the advanced part of the super-course.

Completing project "Popular Movies" (see Part 2 of this series) signaled the end of "Android Developer". Now we are ready to tackle the second element of the program "Advanced Android Developer", a new class with a new syllabus and project.

"Advanced Android Developer" is a mixed bag of self contained material and of coding seven different sample apps to learn about the…

JSON Feed - The New RSS?

SON Feed is a new take on the web syndication format, but unlike RSS and Atom it's in JSON, not XML. So what does it try to do better?

Mainly overcome the perils of XML; it's complex, heavyweight, difficult to parse and not in sync with the current trend wanting web data exchange happening almost exclusively in JSON document representation.

In contrast, JSON is easier to both write and parse, manipulate and consume, especially given that its data types are exact reflections of their native Javascript counterparts.
Devised by Brent Simmons, the original developer of the popular NetNewsWire and Manton Reece creator of Micro Blogs, both with a great background on publishing with RSS, it's a certainty that JSON Feed will emerge as a strong competitor to both Atom and RSS, being based upon their decade long experience on decentralized formats.

On top of that it also tries to tackle a few other issues plaguing RSS, mainly the lack of realtime client notification whe…

Grimoire Lab-GitHub - Stats On Steroids

Grimoire Lab is an open source toolkit built on Python, Elasticsearch and Kibana. It taps into GitHub's raw data through Perceval, a module designed for retrieving data from repositories related to software development.

Perceval forwards the data to another tool for filtering, the so called Sorting Hat, responsible for managing and merging identities that correspond to the same real person/commiter and potentially come from different sources, before finally rendering the data manageable and accessible through rich UI Kibiter dashboards. Kibiter, a fork of Kibana, is what enables the user to create and edit visualizations as well as perform queries facilitated by the underlying Elasticsearch REST APIs.

full article on i-programmer.info