[placeholder]

Checkout Surveys: A Data Science Approach

Squarespace’s Data Science team helps stakeholders across the company make better strategic decisions using data. Marketing attribution, one major focus of the team’s work, is how we answer the question: “How much credit does each form of marketing deserve for a sale or conversion?” Developing formal models is the most rigorous approach to answer that question, and from those models, we can learn which forms of marketing are most critical to drive growth and subsequently invest more in them.

The primary task of marketing attribution is to reconstruct an individual user’s journey to Squarespace by:

  1. Identifying the forms of marketing that the user encountered before subscribing (which is inferred from certain interactions or “touchpoints”), and

  2. Assigning a relative measure of importance to each marketing channel (search, social, TV, etc.) in order to properly divide the credit for the individual’s subscription.

One of the key inputs to our attribution model at Squarespace is the checkout survey, shown to users just after they subscribe. The survey contains one question: “How did you hear about us?” The data we obtain from the survey is especially important as it allows us to gauge the impact of difficult-to-measure marketing channels, like whether customers heard about us on TV, on the radio, or even from this blog! 

Recently, my team hypothesized that from a user experience perspective, there were opportunities to improve the clarity and integrity of marketing performance data emerging from the survey. We were optimistic that we could increase the number of users completing the survey, raise the percentage of responses that could be mapped to a known marketing channel, and eliminate opportunities for confusion, all of which would have a positive downstream effect on marketing performance data.

So in order to quantify and realize this effect, we tackled the overarching question: to what extent are user responsesand consequently marketing performance datasensitive to survey design, and what quantitative signals can we use to create an optimal survey for marketing attribution?

In order to answer these questions, we broke the problem down into two primary steps:

  1. Identify the mechanisms within survey design that make certain responses more or less likely, and

  2. Define success metrics that can confirm the effectiveness of survey design changes.

Mechanism 1: Response Positioning

The main design mechanism that we tied to a marketing channel’s performance in the survey was its relative position: is the channel shown as soon as the user opens the survey, or is it hidden within a lower level or menu? 

In the above hypothetical example, the design of Survey #1 privileges Podcast over the remaining channels because the user is required to perform additional work and click into “Other” in order to see TV, Search, and Radio. The survey over-attributes the effect of the channel because we would likely receive Podcast responses from users that may have heard of us more prominently through another channel like Radio, but perhaps forgot about the channel due to its omission from the top level of the survey.

With Survey #2, channels are shown to the user in a uniform way, and thus all options have a roughly equal chance at selection. The downside to listing all possible channels in one list is the risk of creating confusion and eyestrain, especially in the case of a long list with many options. In situations like ours, however, where distinguishing between many marketing sources is important, creating inner submenus is necessary.

Therefore, we can summarize the first major tradeoff we identified within survey design for marketing attribution: that between a multi-level and single-level survey. The multi-level survey allows for more precision while risking a drop-off in response rate, and a single-level survey minimizes drop-off while compromising the ability to identify specific channels or campaigns. 

Mechanism 2: Language & Inference

The second design mechanism that we identified was the language used in the survey to represent marketing channels. This became a challenge when distinguishing between similar but distinct forms of marketing, such as between YouTube content creators that promote our brand, and YouTube pre-roll video advertisements purchased directly from the platform. To a user, the word “YouTube” may encompass both of these categories; and therefore, in order for us to distinguish between the sources in attribution, we may need to resort to less intuitive language like “pre-roll” or “content sponsorship.” Our goal is to ensure that survey options map both to concepts that are intuitive to respondents, and to the forms of marketing we intend to capture. 

In Survey #3, the choice of language obscures marketing data, and not just as it relates to YouTube. Another example arises through the use of the unintuitive phrase “an audio ad,” a catch-all for sources like podcasts, streaming audio, and radio. Does “podcast” constitute “an audio ad?” We’d argue yes, but because the user has to make a mental association between the language in the survey (“an audio ad”) and the marketing source that actually drove them to Squarespace as they understand it (such as “podcast”), we are likely to under-attribute these channels.

One secondary concern is our ability to distinguish between the audio sources themselves, which in Survey #3 would require a second dropdown within “an audio ad,” leading us back to first trade-off regarding survey leveling. 

Thus, we can summarize a second trade-off in survey design for marketing attribution: that between simple and complex language. The simple approach increases our confidence in user input while sacrificing channel precision, whereas complex language gives us greater flexibility in channel identification while potentially creating confusion and harming data integrity.

Converging on an Optimal Checkout Survey

In recognition of the trade-offs in survey length and language, we proposed the following schema for structuring the checkout survey for marketing attribution. The illustration features a shortened list of example options at each level, whereas in reality the survey contains more options:

The guiding principles that led us to this survey structure are:

  • Prioritization of channels that are easy to convey in simple language at the top of the survey (minimizing both drop-off and linguistic confusion) 

  • Creation of categories for groups of thematically similar channels that would still be useful for us to retain in the event of drop-off (for example, a category that encompasses all possible sources under the overarching category of “word of mouth”) 

  • Relegation of details such as the name of a specific campaign at the bottom level of the survey in the form of a write-in, considered optional on the part of the user and only collected for supplemental information above and beyond the channel itself

  • Sensitivity to eye strain, such that no level of the survey has an exceedingly large number of options, as inferred qualitatively

  • Preventing bias from the order of menu options through randomization

Additionally, we removed options from the old survey that pertained to old campaigns or channels that were otherwise unnecessary to track.

Defining Success

After drafting a survey with the new structure outlined above, we asked the question: “How will we know that the new survey is superior to the old one?” With the understanding that design decisions would shift the relative distribution of channel credit within the survey, we decided to converge on success metrics that pertained primarily to survey completion and the quality of responses. Specifically, we chose 4 metrics: 

  • Attributable responses (the % of responses that we can link to marketing). This gives us an idea of whether our choice of channel positioning and categorization affected our ability to collect surveys that contain useful marketing data. Responses in “Other,” gibberish write-ins, or responses in other miscellaneous categories would be excluded in this proportion.

  • Attributed subscriptions (of all subscriptions, the % that can be attributed to any specific marketing or non-marketing source). This metric helps us answer the question: “Does the survey increase our knowledge of prevalent marketing sources across the business?” In other words, do we now have marketing data from individuals that otherwise wouldn’t have provided us with any information?  

  • Completion rate (the % of subscribers that click “Submit” on the checkout survey). This gauges whether the survey creates any confusion, fatigue, or some other reason to click “Skip.”

  • “Other” response rate (the % of people that select “Other” from the top-level menu of the survey). This is a secondary metric in importance, but nevertheless informs us of whether the options we’ve placed at the top level of the survey capture a suitably large percentage of what users already have in mind.

Quantifying Impact

Like we do with many other features, we tested the implementation in the form of an A/B test that exposed a small number of new subscribers to the new survey. While metrics such as response rate and drop-off were directly measurable from the surveys emerging from the test, predicting the downstream effects of the new survey on marketing attribution and whether the survey “increases the pie” of users tagged with marketing data was a more complex task.  

download-8.png

As the above diagram demonstrates, marketing performance metrics emerging from the test and control groups are a function not only of the new survey, but also of its interaction with other touchpoints and the attribution procedure. As a result, we need to control for the interaction effects in order to truly credit observed differences in marketing performance during the A/B test to design improvements.

As a result, to isolate the effect of the survey, we took a simulation-based approach that pooled all of the subscribers without surveys in both groups, and then randomly sampled them into test and control categories to recalculate attribution and estimate how sensitive downstream performance was to variation within users that don’t take the survey.

With this approach, a marketing channel’s change in credit between the old and the new survey could be measured with confidence intervals estimated directly from sampled populations of users in attribution. By the conclusion of the A/B test and through the subsequent analysis, we estimated that the new survey design drove a 10% increase in response rate, decreased the number of unattributable or unmappable responses by roughly half, and increased the proportion of total subscribers tagged with marketing data: overall, a major success.

Conclusion

Separating the “true” effect of different forms of marketing from the intricacies of measurement is a major challenge of gathering performance data, particularly from surveys. In an online environment, checkout surveys are only one way to address the counterpoint of measurement and impact, and collect feedback from users on the effectiveness of marketing efforts. In its simplest form, our framework of identifying the mechanisms that impact data collection and testing changes to these mechanisms –  not just to the checkout survey but also to other areas of focus on our team – combines basic principles of design and quantitative insights to create a truly analytical approach to understanding our users and driving value for our business.

The Nuts and Bolts with Franklin Angulo

How We Talk About A/B Test Results