8 Best Practices For Starting Your A/B Testing

Tags:

July 8, 2015

I’m often asked by companies who are looking to start up with A/B testing what best practices they should be aware of. My answers vary by time of day, week, season, and I try and give different answers to gauge the response to my advice. So far my human experimentation has proven the following:

1. It’s Science

“I believe that if we use a different color blue on that button, more people will click it.”

Start with a hypothesis and then try and prove or refute it. Don’t just do something to see what happens across tons of different metrics. If you randomly look at enough metrics, then your test has a greater and greater chance of coming back looking like it affected something, even if it really didn’t.

Did you hear about that fake chocolate study? They actually did the study, but one of the “bad” science things they did was not specify what they expected to change. They measured 18 different metrics in the study, and had a 60% chance that one of them would come back as having been changed. They got lucky and got the weight metric as changing. “Eat chocolate and lose weight”

Think about what you want to do.

“We want more people to add an item to their cart.”
“I want our users to buy more shoes.”
“I’d like more users to download the help guides.”

Make sure you’re tracking that activity correctly via events, pageviews, or transactions, and then implement your test looking at just THAT specific metric. Don’t change the button to increase the conversion rate, but then focus on the fact the bounce rate increased.

2. Start With One Variable

“What happens if we change the button to blue, but also have different pictures of animals on the page, like dogs, or cats, or manatees?”

You can do Multivariate Tests these days (MVT) on different testing platforms, but if you’re just getting started, or you don’t have a ton of traffic, try testing just one variable at a time. Do one change, the button, or the header, but don’t do both. If you test a bunch of things and really get into the swing of this whole testing thing, and have enough traffic sure… Eventually do your site wide multivariate tests, but for now.

Change one thing, and see if your KPI metric for it changes in a statistically signficant way. If you’re just starting out you might be able to get some easy early wins, and knock off some low hanging fruit. There really is no reason to get too complicated right off the bat, because you risk getting confused, and making your organization hesitant to continue testing.

Go ahead and have lots of variations on that one variable, but when it comes to MVT’s, press the pause button at least at first. Walk before you run.

3. Test from One to Two Weeks

Try and keep your tests restricted to between 1 and 2 weeks. In my experience you want to test a full week generally to account for weekly/daily variations in traffic and behavior. Maybe your test shows dramatic improvement on a Saturday/Sunday, but in general the opposite during the week. If you’re going to make global changes make sure you test at least a full week.

On the other end, if you test more than two weeks it’s too broad a span of time. Too many different things are happening. The weather is changing, there is an arts festival in town now, the baseball season is over. Who knows. Test too broad a span of time and things will change.

If you ultimately do find that people are converting differently on the weekend in your test, try tests in micro ranges like just the weekend, or just in the afternoon. Maybe you will find certain variations DO work better at different times of the week or the day. If so, and you’re able, you can modify the content on your site on a daily or hourly level to account for it.

But if you’re looking to make global changes, make sure you test from 1 to 2 weeks.

4. Test Seasonally

You ran a test for a week and got a good result, and changed your site. Don’t toss that test out yet. Wait and repeat the test a month later, 3 months later, 6 months later. Make sure that your result was valid and that it wasn’t seasonal. If you test something for a week once then you have 95% certainty (hopefully, we’ll get to that) that it’s correct. If you test that same test once a quarter for a year and every result comes back positive then you’re in much better shape.

But just like above, you could have seasonal variations depending on what you’re testing. If you consistently find that certain things work in Winter, but not in Summer, you can use that to dynamically change your site depending on the weather. However, if you’re looking for a global change, then be sure it makes sense to do so at all times of the year.

(BTW if you get one good global result, I’d feel confident making that change, just come back and test it again in the future.)

5. Math it. Math it REAL GOOD.

Maybe the testing service you are using provides the actual math (like Google Analytics), but whatever you do be aware of the important numbers for statistical significance and what they mean. Maybe you don’t need to really undrestand what a p-value is, but try and understand that we’re looking at probability and certainty/uncertainty. If something comes back as a positive test result keep in mind that it’s basically saying that it’s PROBABLY true, not that it’s REALLY true.

95% certainty is the math saying “We’re 95% certain that the hypothesis you were testing is probably true.” It’s not an absolute yes or no.

Also try and keep in mind that the math itself has no idea if you set up your test right. If you set the test up wrong, or look at the wrong variables, or too many, or whatever, the math doesn’t know whether your actual science was good or not. All it is doing is crunching numbers that you hand it. Make sure you hand it the right numbers, and when speaking about the tests never talk with 100% certainty.

6. Watch What You Change

I’ve seen big test results come back with 99% certainty that a change was positive, and have the change roll into the wild, and have the exact opposite happen. If you test something, and get a good result, don’t just push it to everyone and walk away. Watch the conversion rate after you’ve pushed it live to see how it compares to the test. Don’t come back a month later and go “Wait I thought this was supposed to make things better, why is it worse.” Track and report on the conversions you do week over week after the changes are put into place.

One great way to do this – certain applications will allow you to pick a winner and send 100% of the traffic there. If you have a winner, choose it to get 100% of the traffic for a week or two afterwards. If it looks good you can implment the change permanetly after that point. If things go south, you can roll it back much more easily.

7. Don’t Get Fired

Test as few people as you can get away with. If you’re testing major transactions on your site and test 100% of your audience, 50% might get a crappy variation, and their transactions plummet, and suddenly your company is out lots of money.

Test on a select segment of your audience. If you can get away with 5% or less and still get a result within 1-2 weeks, you’re not risking losing a ton of money, only the 2.5% of the audience getting the B varaition (as opposed to 50%).

To put that in real numbers, imagine if you made 1 million dollars a week in ecommerce revenue, and your variation caused total revenue to drop in half. That would be 50% of the audience losing half the revenue, or $250,000. Or you could test 5% of the audience where only 2.5% of the audience get the bad variation, and you are out $25,000. Sure that still sucks, but if I can only risk losing 25k as opposed to 250k then why not?

If you don’t have a ton of traffic this gets harder to do. You’ll need to determine what that smallest size is that you can test to get a valid result in 1-2 weeks. Unfortunately the more ‘fine grained’ a result means an even larger audience you’d need. There are online calculators that can help you determine this number. We even have a basic stats blog post that goes into more detail.

8. Always Be Testing

ABT baby. ABT. There’s no reason for you to not always have tests to run. Big tests, little tests, button tests, image tests, layout tests, content tests, you name it. Test different product content or titles, test different calls to action, test different layouts for conversion. If you can’t think of something to test, you’re really not trying.

Good Luck!