Experiment like Spotify: With Confidence

Experiment like Spotify: With Confidence
Johan Rydberg, General Manager
Johan Rydberg, General Manager
Sebastian Ankargren, Senior Data Scientist
Sebastian Ankargren, Senior Data Scientist

If you want to experiment like Spotify, check out our experimentation platform Confidence and get a personalized demo.

Contact us

We love experimentation at Spotify. We've been learning from experiments daily for more than 10 years, and over this time have outgrown our experimentation tooling more than once. Right now, hundreds of teams run tens of thousands of experiments at Spotify. With more than half a billion end users distributed across 186 global markets and thousands of device types, we face an enormous amount of complexity in our experiments.

It's become clear to us over the years that there's not a one-size-fits-all approach to experimentation. Needs are different, and you likely don't face the same requirements in all parts of your organization. To adopt an experimental mindset where your business evolves into an experiment-first culture, you need to set yourself up for success by being able to meet these different requirements. To make things even more challenging, your needs are going to change. This means that you shouldn't go looking for a tool that's perfect for you now. You need a tool that you can make perfect for you, and that you can grow with.

With Confidence, it's build and buy — not build versus buy

As our experimentation needs at Spotify have evolved, we've continuously evaluated what the market has to offer. Our choice has been to invest in building our own experimentation platform, because no tool has been able to meet our constantly changing requirements. Building your own platform is an incredible investment that's easy to underestimate. Part of the problem is that among commercial experimentation platforms, your choice has for a long time been build versus buy — with no viable alternative in between the two. With Confidence, we're turning the build-vs-buy dichotomy into a build-and-buy continuum.

Confidence is a modern, warehouse-native experimentation platform that's with you every step of the way. It's a platform with APIs built on our years of experience running large experimentation programs in a complex, multi-faceted environment. It's flexible and adapts to your changing needs: use it with more hand holding in the beginning and let us manage the platform for you, and manage it yourself when you need more customization. Or offload certain parts to us to enable you to build your in-house experimentation platform. Using Confidence to build and buy means you can outsource the parts of your platform where your needs aren't unique, and focus your efforts on the parts of your platform you need to tackle to make experimentation successful in your business. Confidence is the platform we always wanted in the market, but that never existed. Until now.

Use Confidence in a way that fits you right now

As we first described when we announced Confidence, you can use Confidence in three different ways:

  • Managed service. Want to get up and running quickly and with the lowest technical overhead? Use Confidence like any other SaaS product, managed by our team.
  • Backstage plugin. Already have a Backstage instance running (or want to get started)? Get all the features of Confidence as a plugin next to your other developer tools, while we still manage the backend for you. We run our experimentation platform like this at Spotify.
  • APIs. Need more customization? Want to build a bandit or do switchback testing? Integrate the Confidence platform into your own infrastructure with maximum flexibility and extensibility. Confidence provides you with the capabilities to do what you need to do.

Managed service: let us take care of everything for you

The managed service comes packed with all the best practices we've developed at Spotify, and lets you experiment just like we do. It's got our implementations and ideas of how experiments should be run: what makes a successful experiment, what defines an A/B test, and how exactly a rollout works. You don't need to host anything, we take care of all the infrastructure for you. Our managed service is a great option if you're looking to get started quickly with high quality experimentation. If you outgrow the managed service because you need more customizability, you can easily move to the Backstage plugin and the APIs to take your experimentation to the next level.

Backstage plugin: run Confidence next to your other developer tools

If you get started with our managed service, you'll be up and running quickly and can focus on building a thriving experiment culture. At some point, you'll have built that culture. As a consequence, you'll have a greater number and more complex requirements on what you need from your experimentation tool. Maybe you need a tighter integration with other systems you're working with, or you need to define your A/B tests differently to meet your business needs. Confidence gives you this flexibility in how you run and present your experiments. By running Confidence as a Backstage plugin, you're in full control of the user interface — you get the source code to our UI layer, so you can customize, extend, and adapt it to what makes sense for you. By defining your own workflows, you can create the types of experiments you need. You have full access to the code for our workflow implementations of A/B tests and rollouts, meaning that you can use ours as templates and turn them into what's right for you. Plus, this all happens within your Backstage instance. So you don't have to waste energy context switching and navigating between tools. All these capabilities live in your typical developer workflow.

APIs: pick and choose what you need and focus your efforts on what's most impactful

Both the managed service and the Backstage plugin are built on a set of APIs that we offer, and they're all available to you. This means that you can use Confidence as a true platform on which you can build exactly what you need. For example, maybe you've built your own experimentation platform, but you no longer have enough capacity to support and develop your own statistics engine for analyzing the results of experiments. Use our statistics APIs and let Confidence do it for you. Or your situation is the opposite: your experiments need to be analyzed differently, and the statistics engine is the only part you want to have in-house — because this is where your needs are different. Confidence and its set of APIs lets you focus your investments on where the impact is higher, and lets you avoid wasting time building tools that don't solve challenges that are unique to you.

The best part about the different ways you can use Confidence is that you can use what's right for you now, without having to pay the price for that later. Your needs will inevitably change. With Confidence, you can easily grow and customize your experimentation tooling to what you need at each point in time.

The components of Confidence

Confidence consists of a set of components that let you do everything you need to run experiments:

  • Flags: use feature flags to create different experiences. Flags in Confidence aren't just boolean on-off switches, but configurations defined using a schema. Clients, such as apps and websites, resolve flags and get the full configuration back so you can directly apply these values in your code. Flags offer sophisticated targeting and coordination capabilities, low-latency and batch resolving, and versatile rules that decide what variant to return.
  • Events: write events into your data warehouse. You first need to measure baseline behavior to understand if your experiments succeed in changing that behavior. Use Confidence Events if you're lacking a way of writing events into your data warehouse, so that you can know whether your ideas have an impact. You define your events with a data contract so that you can be certain the data you get is of high quality.
  • Metrics: calculate metrics for your experiments. To evaluate your experiments, you need to calculate metrics. Confidence Metrics takes care of these calculations for you and is warehouse-native. The calculations happen in your data warehouse, so you have full transparency into what calculations are made and you don't have to worry about privacy concerns. Confidence Metrics supports multiple types of metrics, including sum, count, and click-through rate metrics, and variance reduction.
  • Stats: analyze the results of your experiments. Confidence Stats includes rigorous implementations of all the things you need to analyze experiments according to the latest and best research. Analyze different types of metrics, including ratio metrics, with variance reduction to boost the efficiency and speed of your experiments. Use sequential tests, including both group sequential tests and always-valid inference, to view results while your experiment is live without risking the integrity of the results. Construct your own decision rules and get appropriate multiple-testing adjustments to both the false positive rate and power that are tailored to what matters — the product decision. Run power analyses that take everything into account to understand what amount of traffic you need.
  • Workflows: implement your own types of experiments, or use ours. Workflows are the orchestrators of experimentation that define what really happens when you go live with an experiment. We've implemented our definitions of A/B tests and rollouts as workflows — and you get the code. By using our implementations as templates, you can easily tweak them to define your own versions of A/B tests and rollouts. Or you want to do something different altogether, like implementing a certain type of bandit. You can create your own custom workflows that help you run the experiments that make an impact for you.

These components are modular with no dependencies among them. Pick the pieces that help you solve your problems, and use what you've already built wherever you need.

What's next

We're excited to give you more insight into what Confidence has to offer and how it can help you on your experimentation journey. We've learned a lot from running many experiments in plenty of different settings with varying degrees of complexity in the requirements we've faced — technically and culturally. This post is the beginning of a series of blog posts in which we'll share more on what Confidence is all about, and how what we've learned has shaped our view of experimentation. Coming up in the series are posts on A/B tests and rollouts, flags, analysis of experiments, metrics, workflows, and more.

If you haven't signed up already, sign up today.