Archive | ab-testing RSS feed for this section

The Most Radical A/B Test I've Ever Done

About four years ago, I started offering Bingo Card Creator for purchase.  Today, I stopped offering it.

That isn’t true, strictly speaking.  The original version of Bingo Card Creator was a downloadable Java application.  It has gone through a series of revisions over the years, but is still there in all its Swing-y glory.  Last year, I released an online version of Bingo Card Creator, which is made through Rails and AJAX.

My personal feeling (backed by years of answering support emails) is that my customers do not understand the difference between downloadable applications and web applications, so I sold Bingo Card Creator without regard to the distinction.  Everyone, regardless of which they are using, goes to the same purchasing page, pays the same price, and is entitled to use either (or both) at their discretion.  It is also sold as a one-time purchase, which is highly unusual for web applications.  This is largely because I was afraid of rocking the boat last summer.

The last year has taught me quite a bit about the difference between web applications and downloadable applications.  To whit: don’t write desktop apps.  The support burden is worse, the conversion rates are lower, the time through the experimental loop is higher, and they retard experimentation in a million and one ways.

Roughly 78% of my sales come from customers who have an account on the online version of the software.  I have tried slicing the numbers a dozen ways (because tracking downloads to purchases is an inexact science in the extreme), and I can’t come up with any explanation other than “The downloadable version of the software is responsible for a bare fraction of your sales.”  I’d totally believe that, too: while the original version of the web application was rough and unpolished, after a year of work it now clocks the downloadable version in almost every respect.

I get literally ten support emails about the downloadable application for every one I get about the web application, and one of the first things I suggest to customers is “Try using the web version, it will magically fix that.”

  • I’m getting some funky Java runtime error.  Try using the web application.
  • I can’t install things on this computer because of the school’s policies.  Try using the web application.
  • How do I copy the files to my niece’s computer?  By the way it is a Mac and I use a Yahoo.  Try using the web application.

However, I still get thousands of downloads a month… and they’re almost all getting a second-best experience and probably costing me money.

Thus The Experiment

I just pushed live an A/B test which was complex, but not difficult.  Testers in group A get the same experience they got yesterday, testers in group B get a parallel version of my website in which the downloadable version never existed.  Essentially, I’m A/B testing dropping a profitable product which has a modest bit of traction and thousands of paying customers.

This is rather substantially more work than typical “Tweak the button” A/B tests: it means that I had to make significant sitewide changes in copy, buttons, calls to action, ordering flow, page architecture, support/FAQ pages, etc etc.  I gradually moved towards this for several months on the day job, refactoring things so that I could eventually make this change in a less painful fashion (i.e. without touching virtually the entire site).  Even with that groundwork laid, when I “flipped the switch”  just now it required changing twenty files.

Doing This Without Annoying Customers

I’m not too concerned about the economic impact of this change: the A/B test is mostly to show me whether it is modestly positive or extraordinarily positive.  What has kept me from doing it for the last six months is the worry that it would inconvenience customers who already use the downloadable version.  As a result, I took some precautions:

The downloadable version isn’t strictly speaking EOLed.  I’ll still happily support existing customers, and will keep it around in case folks want to download it again.  (I don’t plan on releasing any more versions of it, though.  In addition to being written in Java, a language I have no desire to use in a professional capacity anymore, the program is a huge mass of technical debt.  The features I’d most keenly like to add would require close to a whole rewrite of the most complex part of the program… and wouldn’t generate anywhere near an uptick in conversion large enough to make that a worthwhile use of my time, compared to improving the website, web version, or working on other products like Appointment Reminder.

I extended A/Bingo (my A/B testing framework) to give a way to override the A/B test choices for individual users.  I then used this capability to intentionally exclude from the A/B test (i.e. show the original site and not count) folks who hit a variety of heuristics suggesting that they probably already used the downloadable version.  One obvious one is that they’re accessing the site from the downloadable version.  There is also a prominent link in the FAQ explaining where it went, and clicking a button there will show it.  I also have a URL I can send folks to via email to accomplish the same thing, which was built with customer support in mind.

I also scheduled this test to start during the dog days of summer.  Seasonally, my sales always massively crater during the summer, which makes it a great time to spring big changes (like, e.g., new web applications).  Most of my customers won’t be using the software again until August, and that gives me a couple of months to get any hinks out of the system prior to them being seen by the majority of my user base.

My Big, Audacious Goal For This Test

I get about three (web) signups for every two downloads currently, and signups convert about twice as well as downloads do.  (Checking my math, that would imply a 3:1 ratio of sales, which is roughly what I see.)  If I was able to convert substantially all downloads to signups, I would expect to see sales increase by about 25%.

There are a couple of follow-on effects that would have:

  • I think offering two choices probably confuses customers and decreases the total conversion rate.  Eliminating one might help.
  • Consolidating offerings means that work to improve conversion rates automatically helps all prospects, rather than just 60%.

Magic Synergy Of Conversion Optimization And AdWords

Large systemic increases in conversion rates let me walk up AdWords bids.  For example, I use Conversion Optimizer.  Essentially, rather than bidding on a cost per click basis I tell Google how much I’m willing to pay for a signup or trial download.  I tell them 40 cents, with the intention of them actually getting the average at around 30 cents, which implies (given my conversion from trials/signups to purchase) that I pay somewhere around $12 to $15 for each $30 sale.  Working back from 30 cents through my landing page conversion rate, it turns out I pay about 6 cents per click.

Now, assuming my landing page conversion is relatively constant but my trial to sale conversion goes up by 25%, instead of paying $12 to $15 a sale I’d be paying $9.60 to $12 a sale.  I could just pocket the extra money, but rather than doing that, I’m probably going to tell Google “Alright, new deal: I’ll pay you up to 60 cents a trial”, actually end up paying about 40 cents, and end up paying about 8 cents per click.  The difference between 6 and 8 will convince Google to show my ads more often than those of some competitors, increasing the number of trials I get per month out of them.  (And, not coincidentally, my AdWords bill.  Darn, that is a bloody brilliant business model, where they extract rent every time I do hard work.  Oh well, I still get money, too.)

We’ll see if this works or not.  As always, I’ll be posting about it on my blog.  I’m highly interested in both the numerical results of the A/B test as well as whether this turns out being a win-win for my customers and myself or whether it will cause confusion at the margin.  I’m hoping not, but can’t allow myself to stay married to all old decisions just out of a desire to be consistent.

Stats Bug In A/Bingo v1.0.0 and earlier

Many thanks to Ivan for reporting this one: there is a significant bug in A/Bingo calculation of z-scores for versions 1.0.0 and earlier, which borks substantially all z-score calculations and in some cases can change whether A/B test results are reported as statistically significant or not. 

The bug is all of one character long:

def zscore()
#omitted for clarity
 cr1 = alternatives[0].conversion_rate
 cr2 = alternatives[1].conversion_rate
 n1 = alternatives[0].participants
 n2 = alternatives[1].participants

 numerator = cr1 - cr2
 frac1 = cr1 * (1 - cr1) / n1
 frac2 = cr2 * (1 - cr1) / n2   #this line is bugged
 numerator / ((frac1 + frac2) ** 0.5)
end

I have fixed the bug (via the Slicehost console, on a Japanese cafe Internet PC, because I am stuck in Nagoya today again) and pushed the fix to the git repository.

Does this make my results invalid?

You can probably still have confidence in results you got from A/Bingo previously. While the numerical calculation of the z-score was borked, it was borked in a subtle enough fashion that most statistically significant tests will retain their statistical significance under the borked calculation and most statistically insignificant tests will not gain statistical significance magically as a result of the borked calculation. (My quick eyeball suggests that it causes BCC to overstate the significance of tests which are very significant and understate the significance of tests which are insignificant, which is a very fortuitous set of properties for a random bug to have in an A/B testing framework.)

I have re-run statistical confidence tests for everything I’ve ever done for BCC that I still have data for, and no experimental results changed as a result of the error. Nonetheless, I deeply regret the bug, and will write unit tests for the statistics code as soon as I am physically capable of doing so to rule out the possibility of this sort of thing in the future.

Lesson from Madlibs Signup Fad: Do Your Own Tests

Periodically, news of an innovative, goofy, compelling, or compellingly goofy design decision will sweep across the Internets like wildfire.  Most recently, this happened with a madlibs-looking lead generation form.

I think it has much to recommend it in the context of lead generation forms (long, arduous monstrosity that you sign up for in the hopes you are contacted but not spammed to death), but I didn’t see much possible upside for using it on a new user registration form (short form which you sign up to use something).

However, I’m wary of trusting my instincts on such things when I could trust data instead.  There is a key point about A/B testing: trust your data, not somebody else’s data.  After all, you only make money when it improves your conversion rate, not their conversion rate.  You can feel free to use other folk’s successful experiments for inspiration but for heaven’s sake use them to inspire you to run tests, rather than inspire you to fire blindly.

I was particularly wary about trusting this result because, as pointed out by numerous people in the Hacker News discussion, roughly seven things changed between the two forms in the A/B test performed on the standard form versus the madlib form, and there is no particular reason to assume that the salient difference was caused by the part which strikes us as creative as opposed by more boring things like e.g. the call to action in the header.

When In Doubt, Test.  (When Not In Doubt, Test Twice.)

No less than six people said “Hey Patrick have you seen this madlibs thing yet?  You’ve got to try it.”, and because knocking something together would take less than 10 minutes because I have an A/B testing framework that makes this a one-line proposition, I decided I’d humor them.  I isolated just the madlibs versus standard style for the test, knocked up an alternative in about ten minutes with my (decidedly limited) CSS and Javascript skills, and set them against each other.  My conversion goal for this test is successfully inducing someone to sign up for the free trial of Bingo Card Creator.

My Usual Registration Form

The Madlibs Registration Form

P.S. If you have good eyes you’ll spot the other A/B test ongoing on this page.  I’m using the traditional way of mitigating cross-test interaction… ignoring the possibility of it.  Don’t tell your college stats professor, but this actually works pretty well in practice.

Results

I ran this test until A/Bingo, my A/B testing framework for Rails, told me that further testing was just a waste of my time.  It didn’t take long at all — 34 hours after the test alternative went live for the site, the first time I checked the results, they were already overwhelming.  Let me copy/paste right off my public results page:

Signup Madlibs Versus Standard Standard (27.55%) winner
Madlibs (21.73%)
95%

By my count that is a 22% decrease in conversion rates for using the madlibs signup style over the standard signups style, and the fact of the decrease (but not the magnitude) is significant at the 95% confidence level.

For the curious: there were 736 participants in this test, split roughly 50/50, as you would expect.  I love the Internet because where else can you get 736 people to help you improve your website while you sleep, work at the day job on Saturday, have an evening out with friends, and then sleep some more?

Anyhow: test ended, not touching the madlibs idea again.  Before adopting this or any other fad (or good suggestion, for that matter): do your own A/B tests.

A/Bingo 1.0.0 Official Release

Back in August I released A/Bingo, an MIT-licensed OSS Rails A/B testing framework.  I have been using it continuously on Bingo Card Creator, and judging from the support requests I’ve been getting it has gotten some traction in the Rails world.  The 5,000 or so people seeing A/B tests on my site on Valentine’s Day are almost certainly less than 1% of the beneficiaries of the software now. Yay.

As A/Bingo has grown in popularity, I have begun to get requests for features that I did not need urgently for my own development, as well as the usual support requests, patches, and the like.  I want to make your use of the software as pleasant as possible to further evangelize the cause of A/B testing, so here you go:

New features:

A/Bingo now ships with a default dashboard.  Previously, I assumed that everyone would be writing their own dashboard code, so I just included the absolute minimum to show you what you’d need to do to get data out of A/Bingo.  Many people have remarked that they would really appreciate a “works out of the box” solution.  Your wish is my command — you can now enable a default dashboard in about ~30 seconds. It would work totally out of the box, but there are security implications, so I wanted you to have to think for a moment prior to enabling it.

#Create a new controller.  The name is up to you -- this example uses abingo_dashboard_controller.rb
class AbingoDashboardController  :abingo_dashboard

You can customize the dashboard code yourself. Nota bene: it uses your application layout, and has CSS classes applied to most of the elements, so you can style it quickly with CSS if you desire to. By default, it probably looks terrible. If you want to send me a patch to make it pretty, be my guest.

Experiments can now be stopped: Using either the built-in links on the above controller or, if you prefer programmatically scripting things, experiment.end_experiment!(alternative_content), you can now stop an experiment without touching the code.  Stopping an experiment causes all users to get the specified alternative rather than what they would have gotten randomly.  It also ceases stats collection.  Stopping an experiment is irreversible (currently — that might change later).  I tried to make this feature not affect the performance of A/Bingo for larger sites — it makes each test require one extra cache access.  (*cough* Rounding error, hopefully.)

A/Bingo internals are now fairly thoroughly tested: Unit tests are not exactly my cup of tea (“Argh, it works in production, what else do you want from me?!”), but Rails developers look askance at software that does not include them.  So I knuckled down and wrote a test suite.  (Hat tip to Nathaniel Talbott for mentioning A/Bingo in a conference presentation.  The constructive criticism regarding testing drove this change.)

I have not written thorough integration tests for the syntax sugar that you get via the included helper methods, but I’ll fix that eventually.

Named conversions: Previously, all A/Bingo tests required one line to add the test and one line somewhere else to track conversions.  Typically, since businesses have very many tests and fairly few conversion events, this resulted in code like:

#A controller method
def purchase
#Business logic goes here.
  bingo!("new_button_test")
  bingo!("email_copy_test_january")
  bingo!("microcopy_test")
  bingo!("button_colors")
  bingo!("login_button_alignment")
end

That isn’t very DRY at all.

Now, A/Bingo will take an optional parameter :conversion (or :conversion_name) when you’re defining a test, telling it to listen to a particular named conversion. This way, you can reuse the same conversion for as many tests as you want, decreasing the lines of code needed to create most new tests from two to one.

def some_method_with_a_test
  @alternative = ab_test("some_test_name", %w{altA altB}, :conversion => "purchase")
end

def some_other_method_with_a_test
  @foo = ab_test("bar_test", %w{coke water}, :conversion => "purchase")
end

def purchase
  #Business logic goes here!
  bingo!("purchase")  #Calls conversions for both of the above tests.
end

A/Bingo handles tests with spaces in them more gracefully: Although I still don’t recommend doing it, A/Bingo has been improving its handling of test names which have a space in them.  (The reason I don’t recommend it is because some cache stores — particularly memcached — do not support this well.)

Official support for Redis: Assaf Arkin picked Redis for his awesome Vanity project (which also does A/B testing for Rails, among other things), which inspired me to take a look into it.  It appears to be a much, much better alternative for a key/value store than Memcachedb, which is what I use for persistence.  A/Bingo has always accepted any cache store that Rails does, but I want to make it explicitly clear that I run tests against Redis, Memcached, and MemcacheDB. Just add the following to your environment:

#Goes in environment.rb
config.gem  'ezmobius-redis-rb',
  :source => 'http://gems.github.com',
  :lib => false

config.gem  'jodosha-redis-store',
  :source => 'http://gems.github.com',
  :lib => 'redis-store'

#Goes in whatever environment you're using:
require 'redis-store'
Abingo.cache = ActiveSupport::Cache::RedisStore.new

I intend to migration my own deployment to Redis when it becomes reasonably convenient for doing so.

Versioning: Previously I’ve just released patches to the A/Bingo git repository when I got done coding them, but I feel that is suboptimal now that there are substantial deployments which I could potentially break with changes.  So, here’s the skinny: A/Bingo is now, as of this blog post, 1.0.0.  I’ll communicate breaking changes by bumping that number up.  If it goes up by a tenth or more, expect that you need to re-run the migrations and that you will probably lose data on any tests in-progress, so plan ahead for that.  Version increases in that last number should be safe to apply directly.

I do not anticipate breaking the published A/Bingo API (i.e. methods mentioned in the docs) until at least v2.0.0, if ever, so upgrading A/Bingo should almost never cause you to need to update your own code.

How To Contribute

I would like to thank everyone who has submitted bug reports and patches. As usual, I’m always happy to get bug reports or feature requests. If you’d like to contribute code, make it available via git anywhere you please, and then send me an email telling me about it.

How Do I…

If the question isn’t answered in the (copious) documentation, feel free to ask me over email. If your business has particular needs for A/Bingo or you just want to talk A/B testing strategy with somebody who breathes it, I’m available for consulting engagements starting April 1st.

You Should Be Doing A/B Testing

I really can’t stress this enough: A/B testing is an easy, reproducible process that you can use to improve your marketing, website copy, product, user experience, etc. If you haven’t started yet, take A/Bingo, Vanity, or your other framework of choice for a spin. It won’t take you five minutes until you’re getting actionable data which you can use to make money.