At GetYourGuide, data-driven decisions are core to our culture. So what happens when we have to scale up A/B testing without bogging down the entire data department? Data Analyst, Dima Vecheruk and Senior Data Engineer, Eugene Klyuchnikov walk us through how they got their team running tests in self-service mode.
Can you describe the mission of the Data team?
On the product development side, our major contribution is providing tools and analysis around A/B testing, commonly referred to as split testing. The Data Analytics and Data Platform teams have a common goal of enabling all teams to make data-driven decisions. It’s a core part of our company culture.
During a recent data team presentation, a colleague summed it up nicely, “It wouldn’t be a GetYourGuide presentation if I didn’t tell you how we’re going to measure impact.”
What is A/B Testing and what are the different stages?
To recap, an A/B test is a controlled experiment where a population of website users is randomly split into two test groups, a control and a treatment group. We expose the treatment group to a new version of the same web page with some element of the experience changed.
If we measure conversion rate or another success metric between the two groups and see a noticeable difference, we can claim that this difference is due to the change we introduced.
A classic frequentist A/B test follows a strict methodology that consists of three parts:
Define a quantitative hypothesis. This includes setting the baseline of the metric we’re trying to affect. We define how much we think the change would shift due to our treatment. Then we calculate the sample size required to measure such a difference.
Begin the trial. Wait until the sample size is reached.
Stop the trial. Here we interpret the results. Either the results show a statistically significant difference, or it is inconclusive.
How did you change the traditional A/B testing flow?
From a practical perspective, we needed to enhance this flow with additional steps:
Define a quantitative hypothesis.
Start the trial. Wait until the sample size is reached.
Monitor that the test is performing as expected (e.g check that there are no bugs or unexpected behavior that causes money to burn).
Stop the trial and interpret the results.
Dig deeper into the experiment’s impact on user experience to understand why it did or did not work as expected.
Summarize the impact of all experiments that a team ran in terms of business value.
Why did you need a new approach to testing?
In the past, teams at GetYourGuide used custom solutions, as the speed of development of our A/B testing tools was not stable between our web and app platforms. As a result, they often had to speak to a data analyst to help with planning tests or digging deeper into the effects.
Analysts had to write custom code to measure experiments affecting metrics beyond conversion rate, which became increasingly common as product teams became more specialized.
As our engineering and product teams grew steadily and their capacity to ship experiments increased, providing timely support by writing custom queries and even reusable notebooks were no longer enough.
It was also tricky to onboard and train people in A/B testing, as there were too many exceptions between product teams. While we already had a working experiment dashboard built in Looker, not all teams could use it. Not to mention it only worked fully for a subset of standardized experiments.
How did you improve the experimentation process?
To avoid becoming a bottleneck, data analysts and engineers kicked off a project to streamline the process as much as possible. We tried to make the steps outlined above available to teams in self-service mode.
As a result, we created an updated architecture. The new model supports experiments from all product teams and provides fast reporting. We use Looker, a business intelligence tool for data monitoring and exploration. We also built a set of tools, mostly Looker dashboards, that cover the needs around planning, monitoring, and analyzing tests.
Senior Data Engineer, Zoran Stipanicev shares another Looker use case here.
Here are some new things we’ve introduced:
The new and improved data architecture