A/B AND SEE: A BEGINNER’S GUIDE TO A/B TESTING

November 5, 2015 Gavin Lau

The process of decision making in design has always been a popular area of discussion. Why do some designers make choices that others don’t, and why do some designs seemingly work better than others?

From academic study to sketches and anecdotes, the design world is fascinated with process. But for all of the legendary stories of lore, few anecdotes in recent times have had the staying power of Google and its 41 shades of blue.

In trying to determine which out of 2 shades of blue to use for link text, Google tested not only the 2, but the 39 shades of blue in between. The story outlines a relatively minute decision but highlights a rapidly growing approach to making decisions. One based in experimentation, alternatives, and, most importantly, data.

“A/B testing can provide value, but it shouldn’t come at the expense of other areas of design.”

But why did Google test 41 shades of blue, and how could a similar approach help you or your organization? In this post, we’ll explore A/B testing (or multivariate testing): what it is, why you should do it, and its limitations.

A/B and multivariate testing in a nutshell

At its simplest, A/B testing is a method for comparing 2 versions of something against each other to discover which is the most successful. The something can be an image, a button, a headline, or beyond.

Multivariate testing is an expansion of A/B testing where more than 2 versions are compared and (often) more variation is included. This can enable you to test multiple items at once and how they interact together.

For simplicity, the remainder of this post will discuss A/B testing alone, but the principles remain the same for multivariate testing.

Why A/B test

The aim of A/B testing is to enable you to make incremental improvements to your website or app. By pitting your current website or app against one or more variations, you can constantly iterate your design and validate this with real users.

With A/B testing, each test generates new data about what has worked and what hasn’t. Every time something works, this can be included in the website or app and now forms a new and improved design.

A/B testing in the real world

To give a flavor of how A/B testing can be used and what it could do for you, you can view hundreds of example tests on websites like Which Test Won. You can also take a look at these popular case studies:

37 Signal’s account of how they increased conversion by 30% after changing the headline of their signup page.

For the launch of Sim City 5, EA experimented with its order page and showed how an alternative design could increase conversion by 43%.

The basic A/B testing process

Step 1: Where to test
To conduct a basic A/B test, you first need an existing website or app. (A/B testing facilitates incremental improvements to an existing product and is not suitable for testing redesigns or new products and services).

With your website or app, you must decide on an area you wish to explore and, ultimately, try to improve. Picking which area can come from a number of sources:

Analytics: does your analytics indicate that a particular page or screen is a pain point for your users? Are you users all exiting from the same page?
Usability testing: has usability testing shown one area or interaction to be problematic? Have you tested a new solution and now want to test this at greater scale?
Intuition or personal pet peeves: do you believe that something could be better and want to validate this with data? Is there a part that you have always hated and want to try alternatives?

More often than not, your outline of what to test will come from a mixture of the above. Armed with where you’re going to conduct your testing, you can move onto step 2.

Step 2: What to test (and what to measure)
One of the key aspects of A/B testing is that you change only one variable at a time. At first glance, this seems like a simple task, but it can be quite easy to overstep the mark and add more variables.

For example, if you wanted to test a button, you could test changing the copy of a button:

Or changing the color:

But if you were to combine both of these and test a button with different copy and a different color, you would drastically reduce the value of the test.

By testing these 2 buttons against each other, you wouldn’t be able to state why they performed differently: how much of the difference in performance was due to the text change, and how much was due the color change.

“To conduct a valuable A/B test, it’s crucial that you limit changes to one variable.”

So to conduct a valuable A/B test, it’s crucial that you limit changes to one variable.Should you wish to test multiple variables simultaneously, you should conduct a multivariate test, where you’re able to test these multiple variants and better understand what effect each change was having.

Whatever test you decide to do, you also must understand and outline the key measurement (or metric) you’ll track. In the case of the button example, it’s most likely you’d measure the number of people who click on the button. For something like a headline change, you may wish to track the bounce rate or time on site.

What you track will come down to what you test. Just make sure you know what you’re trying to improve before you start A/B testing.

Step 3: How to test
Now that you understand what and where you’ll test, it’s down to how. There are numerous applications that enable A/B testing. Some of the more popular options:

All of these (and others) offer the basic A/B testing process but vary in the additional features they supply. Which one you choose can be down to the amount of development skills you have, how much flexibility you require, or simply pricing.

Many large organizations will often use more than one tool at their disposal depending on the development work required or personal preference, so picking the right tool will depend on your personal circumstances.

Step 4: How big a test
So, you’ve agreed on the location of your test, the variables you’ll look to optimize, and how you’ll technically implement it all. the final question to answer before rolling everything out: how many users will you test with?

Some tools (such as Google Analytics) don’t allow you to set who will see the original version versus who will see the alternative, or even how long the test is. This can be a useful feature for a beginner as it simplifies the overall process.

“A/B testing can’t tell you if you’re solving the correct problem.”

If you’d like to set these variables yourself, it’s worth considering both how long the test will run for and what percentage of users you want to see the original version versus the alternative.

If you work in a risk-adverse organization, you may wish to show the alternative to only 5-10% of your users, whereas others may split the 2 50/50. The choice is ultimately down to your ambitions and the level and type of traffic your website or app receives.

In answering the question of how to split the test and how long it should run, a key question to ask yourself is: how big does the test need to be so that I can be confident the results are accurate?

The technical term for this is statistical significance, or statistical confidence. Your aim: create a test that has a big enough sample size so you can say with over 95% certainty, “Their change caused that outcome.”

How you split your test is therefore one consideration, but how long you need to run your test may come down to the amount of traffic your website or app gets. Don’t worry—as scary as this may sound, there are plenty of calculators online to help you understand if your results are statistically significant or whether you need to run your test for longer.

Step 5: Analyze and decide
The results are in! You’ve done your test, checked that it’s statistically significant, and now you have numbers.

With all of the work you’ve done getting people onboard and setting up the test, many people expect to see results like this:

But more often than not, what you get is this:

Don’t be discouraged (and certainly don’t despair)—A/B testing is all about making those incremental improvements. And while big changes are possible, any improvement is a great start and puts you on the right path.

Even when the data shows you haven’t made an improvement, you’re now in a stronger position than before as you can confidently state what does and does not work.

“Data does not equal understanding.”

If your test has been successful, the next steps are up to you. You may want to roll the new version out to people as soon as possible. Or, if you did a small test to begin with, you may want to do another test where even more people are shown the variant.

What you do with your newfound information is ultimately up to you!

Understanding the limitations of A/B testing

As great and as powerful as A/B testing can be, it’s also important to understand its weaknesses and limitations. In spite of its growing popularity, A/B testing is not a silver bullet that can save every company, but rather another toolkit in your arsenal.

When considering A/B testing, it’s important to understand what it cannot do:

Tell you why. A/B testing is a fantastic tool to understand what works and what doesn’t. What it cannot tell you however is why. For that you’ll need to conduct qualitative user research. This is a crucial element to understand—data does not equal understanding.
Let you test drastic redesigns of your website or app. While in theory, you could pit an entire page design against an alternative and get data on its performance—you wouldn’t be able to understand what about that design was causing any change in performance. Was it the design, the copy, the links? Unless you coupled such an exercise with user research, the results would be meaningless.
Tell you if you’re solving the right/wrong problem. Due to incremental nature of A/B testing, it can be a powerful tool to continuously improve your website or app. A/B testing can’t tell you if you’re solving the correct problem. You may be focusing your tests on the homepage and seeing improvement, but another area of the site might be the real problem. This is a concept known as the local maximum.

What can A/B testing do for you?

If all of this has whetted your appetite for A/B testing, you should hopefully have the information you need to get started. Some minor details may differ, or your organization may have specific needs—but the overall principles remain the same.

A/B testing can be a great tool when used in the right way and for the right reasons. It can enable your company to deliver incremental improvements and increase your success.

But it’s important to understand that A/B testing is one tool in a much wider arsenal for any designer. In his resignation note, Doug Bowman, former Visual Design Lead at Google, was also keen to note the anecdote of Google’s 41 shades of blue. So while A/B testing can provide great value, this should not come at the expense of other areas of design.