04 March 2014

When A/B Testing steers you wrong

Anyone who considers herself analytical and data driven can expound at length on why A/B testing is awesome. It's starting to be an accepted premise in many companies that new changes need to be tested by real users before they're accepted, which is a huge advance over no testing at all. But we must also consider what the weaknesses of A/B testing are and think of how we can address them.

A/B testing is a kind of hill-climbing

Hill climbing algorithms are neat: imagine you're a blind man trying to get to the top of a hill. You don't know which direction is up, so you take a few tentative steps in each direction to find out. Once you've determined how to gain the most altitude, you set off in that direction a bit. Then you repeat: determine which way is up, move that way.

A/B testing helps you determine what the best site is the same way the blind man determines which way is up. You test out some variations on a design, see which one users respond to the best. Keep the best one, test out some more variations. If you do it long enough, you should reach the maximum.

But what if you're not on the right hill?

It's all well and good to walk up to the top of the hill you're on. But it's possible that you're standing next to a mountain, and hill climbing won't get you there. So before you start A/B testing color schemes and 2px differences in button size: make sure you're on the mountain.

Here's an example. Suppose you're looking for a 15% improvement in booking rate. You can A/B test a lot of things. You can try different color schemes, you can test variations on button sizes, you can try increasing the amount of messaging on each page. Those things, though, even if you combine them, even if you pick the winner in each case, are probably not going to add up to 15%. Before you spend your time testing those things, get on the right mountain! Find out what you can do to make your customers excited about your product. Is it the right product? Once you've gotten 12% out of your 15%, you may be able to make up the rest of the difference with better design or by drawing more attention to certain buttons.

The results of A/B tests do not necessarily obey the transitive property

Suppose you run 2 A/B tests, serially. You find the winner of the first A/B test, make the change, and then find the winner of the second A/B test and make the change. In each, you're making slight variations to what was once a beautiful, cohesive design. You may be making improvements on individual metrics without considering their effect on the user experience as a whole. If you don't take time to think about what you're A/B testing or why, your site could end up deviating significantly from the image you were trying to project.

A/B testing is not enough

There's an old saying that a million monkeys typing continuously for an infinite period of time would eventually by coincidence produce the complete works of Shakespeare. But you don't have the equivalent of an infinite workforce or an infinite period of time to wait. So be smarter about how you apply your efforts to A/B testing. Poke your head above the clouds and see if you can spot a higher mountain; look down and make sure you're actually travelling up the hill that you're on.