Automated UI tests are a crucial part of our QA process. Before going to production, we run a set of smoke tests which navigate around parts of the UI, recording screenshots and comparing them against a set of golden images. If a significant mismatch occurs, we consider the test failed and halt deploys while we investigate the issue.

Old Architecture

The software stack for our tests looks roughly like this:

The entry point is a Gruntfile which prepares the environment and enqueues the relevant tests for execution. The tests themselves are JavaScript blocks which navigate around our set of test sites and capture/compare screenshots. To do this reliably is non-trivial, so we use our own navigation library, along with ImageMagick for the image diffing. Underneath this is Nightwatch.js, a handy wrapper around the Selenium browser-automation framework. Finally, we use Firefox to render the pages.

Hardware

In our original setup, this all ran on a Mac Mini box, which we'd SSH to from our build agents to trigger tests and retrieve the results. Since Selenium requires a display, we connected a dummy HDMI display dongle to the Mac.

Issues

Aside from the reliance on the Mac, the main problem with the old architecture was performance. The tests shared the same window environment on a single machine, and were run sequentially. Since each test takes multiple screenshots, and each screenshot requires the page to fully load, the time required to run an extensive test suite would have been several hours.

We release frequently — several times a day — and therefore had to limit the number of tests we ran to a fairly small subset, in order to avoid introducing a bottleneck in our deploy process.

Given these limitations, it was clearly time for an overhaul...

New Architecture

The first step towards version 2.0 of our test setup was to move everything to a Linux environment. This was a natural choice as it meant we could automate the configuration of the hardware the same way we do for our other servers, and spin up new instances as needed. This also appeared to be the best route to take for running Selenium headlessly.

We broadly split the work into 3 tasks:

1. Headless-ize

Running the stack on Linux didn’t require many modifications. Firefox, Selenium, ImageMagick and the various JS packages already all had good support. However, the important change here was getting Nightwatch.js and Selenium to run headlessly, that is, with no display.

We did some experimenting with Xvfb (X virtual framebuffer), an in-memory display server. This emulates an X server and allows a program, in our case the browser, to run in full graphical mode even when no graphics hardware is present. After a few attempts to integrate Selenium nicely with Xvfb, it turned out there was already a great open-source solution which did exactly this: Docker-selenium. This project provides Docker images for running Selenium, either in the standalone or Selenium Grid configuration, and includes Xvfb to support execution in the display-less Docker container. Xvnc is also included for visual debugging.

2. Parallel-ize

With these Dockerized, headless Selenium nodes in place, the solution to the performance problem was simple: run the tests in parallel!

To manage the scheduling of the test jobs at the front end, we used the test_workers feature introduced in Nightwatch v0.7. This is a config variable that enables parallel execution and allows you to specify the number of worker threads.

All we had to do was choose a reasonable value for the worker threads and launch the same number of Selenium nodes. We found that allocating one test and Selenium instance per CPU was a good strategy.

3. Ansible-ize

The final step was to ensure that the configuration and deployment of the new environment was fully automated. We use Ansible for this purpose at Squarespace, so this step involved the creation of some new playbooks.

For the Selenium part, we used the Ansible Docker integration. The following snippet shows the tasks to build and run the Selenium Firefox Nodes:

# Build images if the local repo has changed.
- name: build docker-selenium images
  docker_image:
    name: "{{ item.name }}"
    path: "/opt/docker-selenium/{{ item.path }}"
    state: '{{ (DOCKER_SELENIUM_REPO.changed|default(false) }}'
    tag: local
    nocache: true
  with_items:
    - name: selenium/hub
      path: Hub
    - name: selenium/node-firefox
      path: NodeFirefox
  register: DOCKER_SELENIUM_IMAGES

# Ensure node-firefox containers are up to date and running. 
- name: reload node-firefox docker containers
  docker:
    image: selenium/node-firefox:local
    count: "{{ node_firefox_count }}"
    state: '{{ DOCKER_SELENIUM_IMAGES.changed|default(false) | ternary("reloaded", "started") }}'
    links:
      - "selenium-hub:hub"

Results

Here are the results, across the set of 10 tests that we ran in staging:

  
    Hardware
  	Execution
  	Time (mm:ss)
  

	Mac Mini, OS X 10.8
4 CPUs @ 3.3 GHz 
16 GB RAM
    Sequential
    11:00
  

    CentOS Linux 7 
8 CPUs @ 3.47 GHz 
16 GB RAM
    Parallel
    2:39
  

In terms of execution time, we achieved a speedup of approximately 4x. This not only streamlined our existing deploy process, but also paved the way for increasing the number of tests we run simply by adding more nodes to the cluster.

Old Architecture

Hardware

Issues

New Architecture

1. Headless-ize

2. Parallel-ize

3. Ansible-ize

Results

Latest Article

Twitter

PRODUCTS

COMPANY

COMMUNITY

FOLLOW

Sep 1 Turbocharging Our UI Tests

Old Architecture

Hardware

Issues

New Architecture

1. Headless-ize

2. Parallel-ize

3. Ansible-ize

Results

Oct 3 The Squarespace Template Compiler

Aug 4 The Developer Platform Gets Local

Related Posts

Dec 20 Dec 20 SaaS: Screenshots as a Service

Oct 4 Oct 4 UI Testing at Squarespace: Part I

Oct 18 Oct 18 UI Testing at Squarespace: Part II

PRODUCTS

COMPANY

COMMUNITY

FOLLOW

Dec 20
Dec 20 SaaS: Screenshots as a Service

Oct 4
Oct 4 UI Testing at Squarespace: Part I

Oct 18
Oct 18 UI Testing at Squarespace: Part II