Turbocharging Our UI Tests

Automated UI tests are a crucial part of our QA process. Before going to production, we run a set of smoke tests which navigate around parts of the UI, recording screenshots and comparing them against a set of golden images. If a significant mismatch occurs, we consider the test failed and halt deploys while we investigate the issue.

Old Architecture

The software stack for our tests looks roughly like this:

The entry point is a Gruntfile which prepares the environment and enqueues the relevant tests for execution. The tests themselves are JavaScript blocks which navigate around our set of test sites and capture/compare screenshots. To do this reliably is non-trivial, so we use our own navigation library, along with ImageMagick for the image diffing. Underneath this is Nightwatch.js, a handy wrapper around the Selenium browser-automation framework. Finally, we use Firefox to render the pages.


In our original setup, this all ran on a Mac Mini box, which we'd SSH to from our build agents to trigger tests and retrieve the results. Since Selenium requires a display, we connected a dummy HDMI display dongle to the Mac.


Aside from the reliance on the Mac, the main problem with the old architecture was performance. The tests shared the same window environment on a single machine, and were run sequentially. Since each test takes multiple screenshots, and each screenshot requires the page to fully load, the time required to run an extensive test suite would have been several hours.

We release frequently — several times a day — and therefore had to limit the number of tests we ran to a fairly small subset, in order to avoid introducing a bottleneck in our deploy process.

Given these limitations, it was clearly time for an overhaul...

New Architecture

The first step towards version 2.0 of our test setup was to move everything to a Linux environment. This was a natural choice as it meant we could automate the configuration of the hardware the same way we do for our other servers, and spin up new instances as needed. This also appeared to be the best route to take for running Selenium headlessly.

We broadly split the work into 3 tasks:

1. Headless-ize

Running the stack on Linux didn’t require many modifications. Firefox, Selenium, ImageMagick and the various JS packages already all had good support. However, the important change here was getting Nightwatch.js and Selenium to run headlessly, that is, with no display.

We did some experimenting with Xvfb (X virtual framebuffer), an in-memory display server. This emulates an X server and allows a program, in our case the browser, to run in full graphical mode even when no graphics hardware is present. After a few attempts to integrate Selenium nicely with Xvfb, it turned out there was already a great open-source solution which did exactly this: Docker-selenium. This project provides Docker images for running Selenium, either in the standalone or Selenium Grid configuration, and includes Xvfb to support execution in the display-less Docker container. Xvnc is also included for visual debugging.

2. Parallel-ize

With these Dockerized, headless Selenium nodes in place, the solution to the performance problem was simple: run the tests in parallel!

To manage the scheduling of the test jobs at the front end, we used the test_workers feature introduced in Nightwatch v0.7. This is a config variable that enables parallel execution and allows you to specify the number of worker threads.

All we had to do was choose a reasonable value for the worker threads and launch the same number of Selenium nodes. We found that allocating one test and Selenium instance per CPU was a good strategy.

3. Ansible-ize

The final step was to ensure that the configuration and deployment of the new environment was fully automated. We use Ansible for this purpose at Squarespace, so this step involved the creation of some new playbooks.

For the Selenium part, we used the Ansible Docker integration. The following snippet shows the tasks to build and run the Selenium Firefox Nodes:

# Build images if the local repo has changed.
- name: build docker-selenium images
    name: "{{ item.name }}"
    path: "/opt/docker-selenium/{{ item.path }}"
    state: '{{ (DOCKER_SELENIUM_REPO.changed|default(false) }}'
    tag: local
    nocache: true
    - name: selenium/hub
      path: Hub
    - name: selenium/node-firefox
      path: NodeFirefox

# Ensure node-firefox containers are up to date and running. 
- name: reload node-firefox docker containers
    image: selenium/node-firefox:local
    count: "{{ node_firefox_count }}"
    state: '{{ DOCKER_SELENIUM_IMAGES.changed|default(false) | ternary("reloaded", "started") }}'
      - "selenium-hub:hub"


Here are the results, across the set of 10 tests that we ran in staging:

Hardware Execution Time (mm:ss)
Mac Mini, OS X 10.8
4 CPUs @ 3.3 GHz
Sequential 11:00
CentOS Linux 7
8 CPUs @ 3.47 GHz
Parallel 2:39

In terms of execution time, we achieved a speedup of approximately 4x. This not only streamlined our existing deploy process, but also paved the way for increasing the number of tests we run simply by adding more nodes to the cluster.


The Squarespace Template Compiler

The Developer Platform Gets Local