Usability testing guru Jakob Nielson recently published an interesting comparison between A/B Testing, Usability Testing, and Radical Innovation. Despite my somewhat sensational headline for this post, it truly is worth a read, and I have great respect for Mr. Nielson. Not only has he been making the Web less frustrating for all of us to use for a long time, he also has sweet sideburns.
The article compares three methods of “achieving better design” in terms of cost, benefit, cadence, risk, and more. Throughout, Nielson advocates giving all three methods a try, if feasible, to find the right mix for your business. I wholeheartedly agree that testing, usability engineering, and radical innovation would make an amazing mix for any company if the conditions are right.
Nielson’s article sets realistic expectations about the rarity of radical innovation, and I think his example of the iPhone is apt. Mobile phones were around for a long time, and lots of designs flopped, before Apple’s radical re-imagining of the mobile device occurred and took the consumer world by storm.
I also agree that even if you have the luck or pluck to achieve radical innovation, you should still plan on using both usability testing and A/B testing to continuously improve your offering over time. Because nothing brings on competitors and copycats like a game-changing, innovative product! Even if you don’t achieve radical innovation, a long-term culture of continuous quality improvement may end up being worth more money than a one-time stroke of brilliant design.
Where I disagree with Nielson’s article is in his assertion that A/B testing tends to find small improvements and achieve small gains in KPIs:
A/B testing usually identifies small improvements that might increase sales or other KPIs by a few percent. Sometimes you’re lucky and get 10% or more.
While it may be true that, on the whole, A/B tests end up being relatively low impact, that doesn’t mean it has to be that way! Low-impact A/B tests are usually conducted by marketers who want to explore testing, but don’t necessarily have the training or skills to design and execute a high-impact test.
A/B tests conducted by those with expertise in usability, persuasion, design, and optimization, that leverage the Scientific Method, regularly achieve KPI gains way higher than 10%. I assure you that when I design an A/B test, I’m aiming much higher than 10%, and would consider a 10% lift a “win” only if the site had massive amounts of traffic, or was already highly-optimized.
Another point of contention is with the assertion that A/B testing is only useful when you’re testing minor differences between variations:
A/B’s advantage is that it’s the only way to reliably determine the best design approach when there’s little difference between alternatives.
While it’s accurate that you can use A/B testing to determine the winner between relatively similar variations, you can also use A/B testing to explore radically different approaches in layout, design, persuasion tactics, product positioning, pricing, and more. It’s these more “radical” tests that often have the much larger double and triple-digit increases. They of course carry more risk, though.
I’m definitely a proponent of usability testing, as I have a good deal of experience designing, facilitating, observing, and analyzing usability tests over the years. User testing is extremely useful in new product development, i.e. before you’ve brought something to market.
In some cases, I’d recommend usability testing before I recommended optimization via A/B/n or Multivariate testing. It really depends on what shape the application or website is in to begin with, or how much interactivity the User Experience (UX) contains.
In conclusion, we seem to agree that A/B testing and usability can, and should, play in the same sandbox, but I believe that Nielson’s claims about the effectiveness (and ROI) of split testing are greatly under-valued. I say this as someone who’s spent time in both camps (usability and optimization): both usability testing and split testing can be extremely valuable when applied by experts, and split testing can achieve significant gains in KPIs when done with creativity, discipline, and expertise.
Have you read Nielson’s full article? Do you feel A/B testing has been given fair representation? Am I overreacting to a perhaps innocent summary attempt?