WHY 5 IS THE MAGIC NUMBER FOR UX USABILITY TESTING
The research needed to create a flawless user experience is of greater importance than ever, but there remains a lot of confusion surrounding the process of usability testing.
In-depth research is one thing, but having actual tests to check the solidity of your design is just as important. Many people and companies shy away from the practice of usability testing, believing it involves a lot of resources and expenses. And while others understand that it doesn’t have to be expensive or time-consuming, they still don’t get it completely right and overcomplicate things—rendering their usability tests less useful than they could be.
In reality, the best results come from stringing together many smaller tests that include 5 users at a time.
“When it comes to usability testing, 5 is the magic number.”
Why is usability testing important?
Because user experience is a such a large field and UX designers are charged with the giant task of building a pleasing “overall experience,” flaws are a natural and often unavoidable part of the design process.
Remember the HealthCare.gov launch? It was an info-dense hellstorm of a website that was consistently abandoned by frustrated users. Despite a well-designed, simple wireframe with clearly labeled information and a navigable structure, the deeper people got into the process of finding insurance, the more complicated their experience became.
The highlight of this failure was an overblown security process that took up to 60 minutes to complete, leaving users to scramble for alternatives. Situated Research found that “the early stages of application on HealthCare.gov look simple, and encourage users to begin an application; however, the reality is a long process with difficulties that waste users’ time and a delayed gratification of shopping for coverage.”
“The more users you add to a test group, the less you’ll learn.”
Usability testing is the number-one tool for making sure these usability flaws don’t happen. Without proper usability testing, the result for HealthCare.gov led to nationwide difficulty in obtaining insurance through the public marketplace and boatloads of public bashing, but that all could have been easily avoided. Despite popular misconceptions about how cumbersome this process is, you really only need 5 test users per testing group to make sure your product offers a smooth experience.
What’s so magical about the number 5?
Given that the probability of a user encountering an error during testing is 31%, according to Jeff Sauro of MeasuringU, testing just 5 users would turn up 85% of the problems in an interface. This conclusion is brought to you by binomial probability, or what may be better known as the Poisson Distribution—which can show us the chances of achieving n successes in N trials. A Poisson Distribution with a 31% binomial probability shows that once you add more than 5 users to a test group, returns diminish drastically—the more users you add to a test group, the less you’ll learn.
If you have 3 test users, you’ll catch about 65% of the problems. With 4 users you’ll catch 75%, and with 5 you’ll catch 85%. Once you cross the threshold at 5 and begin to add more test users, the increase of issues that you’ll uncover reduces: With 6, you’ll catch 90% of the issues; with 8, you’ll catch 95%; and with 12, you’ll catch 99%.
These diminishing returns are due to the fact that user experiences overlap. According to the Nielsen Norman Group, “As soon as you collect data from a single test user, your insights shoot up and you have already learned almost a third of all there is to know about the usability of the design… When you test the second user, you will discover that this person does some of the same things as the first user…
People are definitely different, so there will also be something new that the second user does that you didn’t observe with the first user…
The third user will do many things that you already observed… Plus, of course, the third user will generate a small amount of new data, even if not as much as the first and second user did.”
The amount of new data collected decreases with each test user and flattens out most prominently at 5 test users, making 5 users the right size for the right value.
Research difficulties
A word of caution: Don’t fetishize the numbers here by solely focusing on your test users’ quantitatively recorded feedback. The qualitative experience of your users is critical—your test users will struggle with a certain aspect of your website, and understanding why and working with them to brainstorm solutions is essential.
For example, perhaps your test users found the shift in color scheme from the homepage to the signup screen jarring? Instead of only receiving some vague quantification of the color scheme issues, you can prompt users to explain. The blue used for the navigation bar on the homepage is too similar to the blue on the signup screen’s continue button? That’s any easy fix.
At the same time, getting the most out of your 5 test users in a research session requires a specific skillset, so your demeanor and positioning in relation to the test user is key. No test user wants someone hovering over them, asking leading questions, and appearing hurt when criticism is voiced. Conduct yourself accordingly. No test user will respond candidly in an uncomfortable setting. Put test users at ease by opening with an informal chat, start the meeting by clearly explaining your goals, and continue to speak conversationally with curiosity as your main motivation. How did you feel about X? Why did Y make you feel Z?
“With test users, speak conversationally with curiosity as your main motivation.”
The controversy
No one doubts the accuracy of the Poisson Distribution, but there’s some controversy surrounding the use of 31% as the average problem frequency. While 31% is a problem frequency that has been derived from many studies, only very new active websites and applications may be so vulnerable. More streamlined and polished products may, in fact, have a much lower problem frequency. For instance, let’s say one issue will only affect 5% of the population. According to binomial probability, this new frequency would change the number of users necessary to find the problem.
As a baseline, 5 is still the golden rule in UX usability testing.
Of course, it’s impossible to know the probability of discovery for every potential problem. According to MeasuringU, “As a strategy, pick some percentage of problem occurrence, say 20%, and likelihood of discovery, say 85%, which would mean you’d need to plan on testing 9 users.
After testing 9 users, you’d know you’ve seen most of the problems that affect 20% or more of the users. If you need to be surer of the findings, then increase the likelihood of discovery, for example, to 95%. Doing so would increase your required sample size to 13.”
Any number of factors—the level of refinement of your website, the size of your user base, or likelihood of discovery for a problem—can change your magic number for usability testing drastically.
As a baseline, though, 5 is still the golden rule. Limit testing to 5 users and you’ll uncover the majority of problems that plague your website or app, while still keeping your costs low and the process simple.