Here on Squarespace’s Strategy and Analytics team, we build models that predict customer lifetime value, forecast customer service demand, and even determine how much we should spend on those ubiquitous Squarespace ads you hear on your favorite podcast.

We subject our models to code review by our peers, but this process does not always address larger, system-level questions: how do we know if our model is the best possible model we are capable of building as a team? How does it perform on the specified task relative to models created through alternative approaches?

During Squarespace’s most recent Hack Week, we experimented with a different approach to model building: an internal Kaggle competition. Kaggle is a platform for data science competitions, which follow a simple recipe: 1) define a prediction task, 2) provide training data to participants, and 3) score submissions on a subset of the data and display the results on a leaderboard. Netflix is often credited with popularizing the use of data science competitions to solve business problems via the 2009 Netflix prize, in which teams competed to build the best model for predicting movie ratings. However, the idea of a “common task framework” actually dates back to at least the 1980s, when DARPA challenged teams of researchers to produce the best possible rules for machine translation.

For our internal competition, we wanted to predict subscription rates of customers who start a free trial on Squarespace. The dataset for this competition included anonymized information on customers’ marketing channels, geographic locations, product usage, previous trials, and, of course, whether or not the customers subscribed to Squarespace within 28 days of starting a trial. We used Kaggle’s InClass platform to host a private competition, encrypted unique identifiers, and uploaded no personally identifiable customer information to Kaggle’s servers.

The competition was successful in generating insights from the data. Teams took diverse approaches, experimenting with algorithms ranging from gradient-boosted decision trees to neural nets. Multiple teams independently determined that training on a small subset of the data produced results similar to those produced by training on the full dataset, a finding which reduced training time by a factor of ten. Another surprise was that seasonality was not a major factor in trial conversion relative to other factors. Either the seasonal effects were not strong, or they were captured indirectly through other variables in the dataset.

There are several downsides of the competition format. First, it is of course not the most efficient use of resources. Second, the focus on results above all else can also push teams to neglect system design considerations, such as system-level dependencies and production runtimes.

But the competition was by no means a waste of effort. Participants gained familiarity with this critical dataset, and the friendly competition format encouraged teams to collaborate effectively and push their limits. All of the code produced for the competition was stored in a shared repository, so any individual or small team building a model for a business application would not have to start from scratch. The “common task framework” could be something we revisit in the future, especially in cases where model performance is more important than interpretability.

Want to join our team of passionate data scientists? Check out our open positions.

Latest Article

Twitter

PRODUCTS

COMPANY

COMMUNITY

FOLLOW

Dec 7 Can Friendly Competition Lead to Better Models?

Mar 14 3 Kinds of Good Tech Debt

Nov 21 The Nuts and Bolts with Ed Bridges

Related Posts

Mar 20 Mar 20 Building On Solid Ground: Getting Postgres Foundations Right With pgbedrock

Nov 20 Nov 20 Checkout Surveys: A Data Science Approach

PRODUCTS

COMPANY

COMMUNITY

FOLLOW

Mar 20
Mar 20 Building On Solid Ground: Getting Postgres Foundations Right With pgbedrock

Nov 20
Nov 20 Checkout Surveys: A Data Science Approach