If you want to experiment like Spotify, check out our experimentation platform Confidence and get a personalized demo.
Contact usExperimentation looks different in each organization, and often changes within each one as their business grows and diversifies. In some companies, a few people spearhead the experimentation effort and are responsible for everything end to end. In other cases, they’ll split up responsibilities and let engineers drive the implementation work while data scientists take charge of the data and analysis aspects. The right people to involve varies greatly between companies, and even within them, depending on organizational structure and the culture. Regardless of how your company operates, collaboration fuels efficient experimentation.
For teams that work cross-functionally, like Spotify, a typical experiment goes through the following stylized phases:
- Hypothesizing. The product manager generates a hypothesis based on earlier learnings (user research, insights from data) in combination with aligning with more strategic goals.
- Implementation. The designers and engineers work together to implement a prototype of the change.
- Testing. The data scientists help set up an experiment to validate the hypothesis with data.
- Analysis. Everyone uses the platform to learn about the impact of the experiment, and whether the experiment validated the hypothesis.
Sometimes, both inside and outside of Spotify, teams are more engineering-focused, with engineers responsible for implementation, testing, and analysis. In other cases, implementation happens on an ad-hoc basis because the team running the tests lacks engineering expertise itself. For such teams, engineers often help with implementation. Product teams can then use the implementation to run multiple experiments without any additional engineering effort.
Regardless of team composition, these four phases immediately translate into steps in the experiment setup process:
- Hypothesizing determines the hypothesis of the test, the test’s target audience, and what metrics the test must track.
- Implementation defines the flag and its variants, and adapts the code to use the flags through the SDKs.
- Testing is the actual test itself, including using sample size calculations to understand what amount of traffic the test needs to validate the hypothesis.
- Analysis is available for everyone through the platform, with shipping recommendations suggesting what the results of the test implies for the product change.
Because experimentation is a team sport, Confidence has several features that help you collaborate and perform these steps effectively.
Access control
Company cultures and legal requirements vary widely. From companies where anyone can do anything, to cultures where legal responsibilities mean that only some people can have certain permissions. Access control can also help you avoid mistakes, like accidentally launching experiments or mistakenly disabling feature flags. To support a vastly varying set of requirements on access control, Confidence offers a flexible permission model that lets you create groups and policies, assign owners, and share resources.
What this means in practice is that you can create groups of users and assign roles to these groups through policies. Consider the earlier example, where engineers create the flags, and data scientists are responsible for setting up the experiments (known as workflows in Confidence). A set of policies that enforce these roles through permissions is as follows.
These policies enforce that only users in the Engineers group can edit flags, whereas only users in the Data Scientists group can edit workflows. Everyone can read everything. While policies let you assign permissions globally, you can also share individual resources whenever necessary. Consider, for example, an A/B test that’s set up and owned by the Data Scientists group, but an engineer on the team needs to be able to edit this specific A/B test. By sharing the A/B test, you can give that colleague permission to edit the A/B test. Doing so automatically gives them permission to edit and view related resources such as metrics and flags that the A/B test uses.
Surfaces
At Spotify, we use the concept of surfaces for organizing experiments. A surface is a part of the product that lets you directly group experiments from the point of view of your user. For example, surfaces at Spotify include Search for the search functionality and Home for the home screen of the app. The work on these surfaces often spans multiple teams and initiatives. The concept of a surface means the work on these parts of the product is visible to all. The structure of an organization also tends to change more often than the key parts of the product. Organizing experiments around the product itself thus gives more stability than letting it revolve around your current team structure.
Each surface in Confidence has a timeline view, which gives you a visual overview of what’s happening on your surface. You can see when there’s a new test coming up, and when your currently running experiments plan to end. It’s also a helpful tool for customer support. If they see a surge in bug complaints for the search functionality, they can quickly go in and see if there’s an experiment that could be the cause of the bug.
Reviews and Comments
To run successful experiments often means a lot of back and forth discussion. You need to make the right trade-offs, make decisions on what you measure, who you run the experiment for, and understand the results. These discussions are important for learning and improving, and help you eventually nail your experiments. Confidence makes it possible for you to have these discussions right at the source of your discussion — your experiment.
Just like reviews are central for certain types of work — like code changes or important documents – experiments can be in need of reviews too. In Confidence, reviewing resembles familiar code-style reviews on version control systems like GitHub. Consider, for example, a situation where a colleague asks you to review the experiment they’ve set up, so that you can help them make sure that they did it correctly. The image below shows you what that experience looks like in Confidence. The wrong variant is assigned as the control group, and the test has no planned runtime set. The review asks for changes, and tags the appropriate people. You discuss in threads, and resolve them when you settle the discussion.
The review process resembles the experience for reviewing code, letting you leave an overall comment and an outcome of your review. If you’re on the other side of the review, you can send requests to your colleagues and your request shows up on their to-do list on the Confidence home page. The to-do list also includes comment threads where you’ve been tagged.
Decisions
After your test ends, you need to make a decision on what to do next — should you release the change, try something new, or scrap this idea altogether? While your reasoning for your decision is top of mind now, it won’t always be. For future reference, you need to write down and save your conclusions from the test so that you and your coworkers can know what the experiment led to and why. In Confidence, you can keep track of experiment outcomes by recording the decisions together with summaries that include additional information. This leaves a trace for the future, and saves you from having to remember what you did after this and why. Others can also comment on the conclusion you write if there’s a need for them to ask you follow-up questions.
Work together and experiment better
At Spotify, we have learned over the years that having a strong experimentation culture requires good tools for collaboration and, frankly, administration. With access control and surfaces, we minimize the risk of mistakes and misunderstandings. With reviews and comments, teams can iterate on experiments fast. All together, this enables you to scale experimentation to thousands of experiments.
Confidence is currently available in private beta. If you haven't signed up already, sign up today and we'll be in touch.