Feature lists tend to oversimplify experimentation offerings. Not all implementations of the same feature are alike, and a vague RFP leads to frustration and friction when the platform does not deliver what the checkmark promised. Here we describe how we would specify an RFP for various capabilities, and what to look for to ensure the implementation is worth committing to.
Topics where we have a connected implementation and can show what the connected version looks like.
Most experimentation platforms have a sample size calculator. Almost none connect it to the analysis method the experiment will actually use. Here is what to ask instead of "do you have a calculator?"
Read moreEvery platform claims to support sequential testing. The claim is almost always incomplete. Here is what to ask instead of "do you have sequential testing?"
Read moreEvery platform lets you add multiple metrics. Most will display results for each one. What they rarely tell you is what to do next. Here is what to ask.
Read moreEvery platform says it corrects for multiple comparisons. Most do, partially. Here is what to ask instead of "do you correct for multiple testing?"
Read moreEvery platform that offers variance reduction claims it cuts runtime by 20-50%. What is usually missing is how far that reduction actually reaches. Here is what to ask.
Read moreRevenue per user. Streams per session. Most platforms support ratio metrics. Most get the variance wrong. Here is what to ask instead of "do you support it?"
Read moreEvery platform gives you a sample size estimate before the experiment starts. Almost none revisit it after. Here is what to ask about during-experiment power monitoring.
Read moreEvery platform measures user behavior after exposure. The question most buyers never ask is: over what time period? Here is what to ask about observation windows.
Read moreEvery platform lets you look at results. The question is whether the platform looks for you. Here is what to ask about monitoring and alerting.
Read moreMost platforms let you randomize by account or store. The question is what happens in the analysis after randomization. Here is what to ask.
Read moreWhen a user generates zero events, the platform makes a choice that changes what the experiment measures. Most vendors do not document which choice they make.
Read moreEvery platform lets you slice by dimension. The question is whether the platform controls the false positive rate when you do.
Read moreTopics where we deliberately chose not to ship the feature, and what to look for if you need it.
Most vendors support percentile metrics. Most implementations break down exactly when you need them. Here is what to ask instead of "do you support it?"
Read moreMost experimentation programs do not need geo-lift. If you do, the question is whether the platform forces you to confront the assumptions that make or break the analysis.
Read moreSwitchback experiments exist because standard A/B tests break down when users interact with each other. Only two vendors offer support. Here is what to ask.
Read moreThe label Bayesian does not tell you what stopping rule is used. Here is what to ask instead.
Read more