Archive by Author

Falsehoods Programmers Believe About Names

[This post has been translated into Japanese by one of our readers: 和訳もあります。]

John Graham-Cumming wrote an article today complaining about how a computer system he was working with described his last name as having invalid characters.  It of course does not, because anything someone tells you is their name is — by definition — an appropriate identifier for them.  John was understandably vexed about this situation, and he has every right to be, because names are central to our identities, virtually by definition.

I have lived in Japan for several years, programming in a professional capacity, and I have broken many systems by the simple expedient of being introduced into them.  (Most people call me Patrick McKenzie, but I’ll acknowledge as correct any of six different “full” names, any many systems I deal with will accept precisely none of them.) Similarly, I’ve worked with Big Freaking Enterprises which, by dint of doing business globally, have theoretically designed their systems to allow all names to work in them.  I have never seen a computer system which handles names properly and doubt one exists, anywhere.

So, as a public service, I’m going to list assumptions your systems probably make about names.  All of these assumptions are wrong.  Try to make less of them next time you write a system which touches names.

  1. People have exactly one canonical full name.
  2. People have exactly one full name which they go by.
  3. People have, at this point in time, exactly one canonical full name.
  4. People have, at this point in time, one full name which they go by.
  5. People have exactly N names, for any value of N.
  6. People’s names fit within a certain defined amount of space.
  7. People’s names do not change.
  8. People’s names change, but only at a certain enumerated set of events.
  9. People’s names are written in ASCII.
  10. People’s names are written in any single character set.
  11. People’s names are all mapped in Unicode code points.
  12. People’s names are case sensitive.
  13. People’s names are case insensitive.
  14. People’s names sometimes have prefixes or suffixes, but you can safely ignore those.
  15. People’s names do not contain numbers.
  16. People’s names are not written in ALL CAPS.
  17. People’s names are not written in all lower case letters.
  18. People’s names have an order to them.  Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
  19. People’s first names and last names are, by necessity, different.
  20. People have last names, family names, or anything else which is shared by folks recognized as their relatives.
  21. People’s names are globally unique.
  22. People’s names are almost globally unique.
  23. Alright alright but surely people’s names are diverse enough such that no million people share the same name.
  24. My system will never have to deal with names from China.
  25. Or Japan.
  26. Or Korea.
  27. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use.
  28. That Klingon Empire thing was a joke, right?
  29. Confound your cultural relativism!  People in my society, at least, agree on one commonly accepted standard for names.
  30. There exists an algorithm which transforms names and can be reversed losslessly.  (Yes, yes, you can do it if your algorithm returns the input.  You get a gold star.)
  31. I can safely assume that this dictionary of bad words contains no people’s names in it.
  32. People’s names are assigned at birth.
  33. OK, maybe not at birth, but at least pretty close to birth.
  34. Alright, alright, within a year or so of birth.
  35. Five years?
  36. You’re kidding me, right?
  37. Two different systems containing data about the same person will use the same name for that person.
  38. Two different data entry operators, given a person’s name, will by necessity enter bitwise equivalent strings on any single system, if the system is well-designed.
  39. People whose names break my system are weird outliers.  They should have had solid, acceptable names, like 田中太郎.
  40. People have names.

This list is by no means exhaustive.  If you need examples of real names which disprove any of the above commonly held misconceptions, I will happily introduce you to several.  Feel free to add other misconceptions in the comments, and refer people to this post the next time they suggest a genius idea like a database table with a first_name and last_name column.

Detecting Bots with Javascript for Better A/B Test Results

I am a big believer in not spending time creating features until you know customers actually need them.  This goes the same for OSS projects: there is no point in overly complicating things until “customers” tell you they need to be a little more complicated.  (Helpfully, here some customers are actually capable of helping themselves… well, OK, it is theoretically possible at any rate.)

Some months ago, one of my “customers” for A/Bingo (my OSS Rails A/B testing library) told me that it needed to exclude bots from the counts.  At the time, all of my A/B tests were behind signup screens, so essentially no bots were executing them.  I considered the matter, and thought “Well, since bots aren’t intelligent enough to skew A/B test results, they’ll be distributed evenly over all the items being tested, and since A/B tests measure for difference in conversion rates rather than measuring absolute conversion rates, that should come out in the wash.”  I told him that.  He was less than happy about that answer, so I gave him my stock answer for folks who disagree with me on OSS design directions: it is MIT licensed, so you can fork it and code the feature yourself.  If you are too busy to code it, that is fine, I am available for consulting.

This issue has come up a few times, but nobody was sufficiently motivated about it to pay my consulting fee (I love when the market gives me exactly what I want), so I put it out of my mind.  However, I’ve recently been doing a spate of run-of-site A/B tests with the conversion being a purchase, and here the bots really are killers.

For example, let’s say that in the status quo I get about 2k visits a day and 5 sales, which are not atypical numbers for summer.  To discriminate between that and a conversion rate 25% higher, I’d need about 56k visits, or a month of data, to hit the 95% confidence interval.  Great.  The only problem is that A/Bingo doesn’t record 2k visits a day.  It records closer to 8k visits a day, because my site gets slammed by bots quite frequently.  This decreases my measured conversion rate from .25% to .0625%.  (If these numbers sound low, keep in mind that we’re in the offseason for my market, and that my site ranks for all manner of longtail search terms due to the amount of content I put out.  Many of my visitors are not really prospects.)

Does This Matter?

I still think that, theoretically speaking, since bots aren’t intelligent enough to convert at different rates over the alternatives, the A/B testing confidence math works out pretty much identically.  Here’s the formula for Z statistic which I use for testing:

The CR stands for Conversion Rate and n stands for sample size, for the two alternatives used.  If we increase the sample sizes by some constant factor X, we would expect the equation to turn into:

We can factor out 1/X from the numerator and bring it to the denominator (by inverting it).  Yay, grade school.

Now, by the magic of high school algebra:

If I screw this up the math team is *so* disowning me:

Now, if you look carefully at that, it is not the same equation as we started with.  How did it change?  Well, the reciprocal of the conversion rate (1 – cr) got closer to 1 than it was previously.  (You can verify this by taking the limit as X approaches infinity.)  Getting closer to 1 means the numerators of the denominator get bigger, which means the denominator as a whole gets modestly bigger, which means the Z score gets modestly smaller, which could possibly hurt the calculation we’re making.

So, assuming I worked my algebra right here, the intuitive answer that I have been giving people for months is wrong: bots do bork statistical significance testing, by artificially depressing z scores and thus turning statistically significant results into null results at the margin.

So what can we do about it?

The Naive Approach

You might think you can catch most bots with a simple User-Agent check.  I thought that, too.  As it turns out, that is catastrophically wrong, at least for the bot population that I deal with.  (Note that since keyword searches would suggest that my site is in the gambling industry, I get a lot of unwanted attention from scrapers.)  It barely got rid of half of the bots.

The More Robust Approach

One way we could try restricting bots is with a CAPCHA, but it is a very bad idea to force all users to prove that they are human just so that you can A/B test them.  We need something that is totally automated which is difficult for bots to do.

Happily, there is an answer for that: arbitrary Javascript execution.  While Googlebot (+) and a (very) few other cutting edge bots can execute Javascript, doing it on web scales is very resource intensive, and also requires substantially more skill for the bot-maker than scripting wget or your HTTP library of choice.

+ What, you didn’t know that Googlebot could execute Javascript?  You need to make more friends with technically inclined SEOs.  They do partial full evaluation (i.e. executing all of the Javascript on a page, just like a human would) and partial evaluation by heuristics (i.e. grep through the code and make guesses without actually executing it).  You can verify full evaluation by taking the method discussed in this blog post and tweaking it a little bit to use GETs rather than POSTs, then waiting for Googlebot to show up in your access logs for the forbidden URL.  (Seeing the heuristic approach is easier — put a URL in syntactically live but logically dead code in Javascript, and watch it get crawled.)

To maximize the number of bots we catch (and hopefully restrict it to Googlebot, who almost always correctly reports its user agent), we’re going to require the agent to perform three tasks:

  1. Add two random numbers together.  (Easy if you have JS.)
  2. Execute an AJAX request via Prototype or JQuery.  (Loading those libraries is, hah, “fairly challenging” to do without actually evaluating them.)
  3. Execute a POST.  (Googlebot should not POST.  It will do all sorts of things for GETs, though, including guessing query parameters that will likely let it crawl more of your site.  A topic for another day.)

This is fairly little code.  Here is the Prototype example


  var a=Math.floor(Math.random()*11);
  var b=Math.floor(Math.random()*11);
  var x=new Ajax.Request('/some-url', {parameters:{a: a, b: b, c: a+b}})

and in JQuery:


  var a=Math.floor(Math.random()*11);
  var b=Math.floor(Math.random()*11);
  var x=jQuery.post('/some-url', {a: a, b: b, c: a+b});

Now, server side, we take the parameters a, b, and c, and we see if they form a valid triplet.  If so, we conclude they are human. If not, we leave continue to assume that they’re probably a bot.

Note that I could have been a bit harsher on the maybe-bot and given them a problem which trusts them less: for example, calculate the MD5 of a value that I randomly picked and stuffed in the session, so that I could reject bots which hypothetically tried to replay previous answers, or bots hand-coded to “knock” on a=0, b=0, c=0 prior to accessing the rest of my site.  However, I’m really not that picky: this isn’t to keep a dedicated adversary out, it is to distinguish the overwhelming majority of bots from humans. (Besides, nobody gains from screwing up my A/B tests, so I don’t expect there to be dedicated adversaries. This isn’t a security feature.)

You might have noticed that I assume humans can run Javascript.  (My site breaks early and often without it.)  While it is not specifically designed that Richard Stallman and folks running NoScript can’t influence my future development directions, I am not overwrought with grief at that coincidence.

Tying It Together

So now we can detect who can and who cannot execute Javascript, but there is one more little detail: we learn about your ability to execute Javascript potentially after you’ve started an A/B test.  For example, it is quite possible (likely, in fact) that the first page you execute has an A/B test in it somewhere, and that you’ll make an AJAX call from that page you register your humanness after we have already counted (or not counted) your participation in the A/B test.

This has a really simple fix.  A/Bingo already tracks which tests you’ve previously participated in, to avoid double-counting.  In “discriminate against bots” mode, it tracks your participation (and conversions) but does not add them to the totals immediately unless you’ve previously proven yourself to be a human.  When you’re first marked as a human, it takes a look at the tests you’ve previously participated in (prior to turning human), and scores your participation for them after the fact.  Your subsequent tests will be scored immediately, because you’re now known to be human.

Folks who are interested in seeing the specifics of the ballet between the Javascript and server-side implementation can, of course, peruse the code at their leisure by git-ing it from the official site.  If you couldn’t care less about implementation details but want your A/B tests to be bot-proof ASAP, see the last entry in the FAQ for how to turn this on.

Other Applications

You could potentially use this in a variety of contexts:

1) With a little work, it is a no interaction required CAPCHA for blog commenting and similar applications. Let all users, known-human and otherwise, immediately see their comments posted, but delay public posting of the comments until you have received the proof of Javascript execution from that user. (You’ll want to use slightly trickier Javascript, probably requiring state on your server as well.) Note that this will mean your site will be forever without the light of Richard Stallman’s comments.

2) Do user discrimination passively all the time. When your server hits high load, turn off “expensive” features for users who are not yet known to be human. This will stop performance issues caused by rogue bots gone wild, and also give you quite a bit of leeway at peak load, since bots are the majority of user agents. (I suppose you could block bots entirely during high load.)

3) Block bots from destructive actions, though you should be doing that anyway (by putting destructive actions behind a POST and authentication if there is any negative consequence to the destruction).

The Most Radical A/B Test I've Ever Done

About four years ago, I started offering Bingo Card Creator for purchase.  Today, I stopped offering it.

That isn’t true, strictly speaking.  The original version of Bingo Card Creator was a downloadable Java application.  It has gone through a series of revisions over the years, but is still there in all its Swing-y glory.  Last year, I released an online version of Bingo Card Creator, which is made through Rails and AJAX.

My personal feeling (backed by years of answering support emails) is that my customers do not understand the difference between downloadable applications and web applications, so I sold Bingo Card Creator without regard to the distinction.  Everyone, regardless of which they are using, goes to the same purchasing page, pays the same price, and is entitled to use either (or both) at their discretion.  It is also sold as a one-time purchase, which is highly unusual for web applications.  This is largely because I was afraid of rocking the boat last summer.

The last year has taught me quite a bit about the difference between web applications and downloadable applications.  To whit: don’t write desktop apps.  The support burden is worse, the conversion rates are lower, the time through the experimental loop is higher, and they retard experimentation in a million and one ways.

Roughly 78% of my sales come from customers who have an account on the online version of the software.  I have tried slicing the numbers a dozen ways (because tracking downloads to purchases is an inexact science in the extreme), and I can’t come up with any explanation other than “The downloadable version of the software is responsible for a bare fraction of your sales.”  I’d totally believe that, too: while the original version of the web application was rough and unpolished, after a year of work it now clocks the downloadable version in almost every respect.

I get literally ten support emails about the downloadable application for every one I get about the web application, and one of the first things I suggest to customers is “Try using the web version, it will magically fix that.”

  • I’m getting some funky Java runtime error.  Try using the web application.
  • I can’t install things on this computer because of the school’s policies.  Try using the web application.
  • How do I copy the files to my niece’s computer?  By the way it is a Mac and I use a Yahoo.  Try using the web application.

However, I still get thousands of downloads a month… and they’re almost all getting a second-best experience and probably costing me money.

Thus The Experiment

I just pushed live an A/B test which was complex, but not difficult.  Testers in group A get the same experience they got yesterday, testers in group B get a parallel version of my website in which the downloadable version never existed.  Essentially, I’m A/B testing dropping a profitable product which has a modest bit of traction and thousands of paying customers.

This is rather substantially more work than typical “Tweak the button” A/B tests: it means that I had to make significant sitewide changes in copy, buttons, calls to action, ordering flow, page architecture, support/FAQ pages, etc etc.  I gradually moved towards this for several months on the day job, refactoring things so that I could eventually make this change in a less painful fashion (i.e. without touching virtually the entire site).  Even with that groundwork laid, when I “flipped the switch”  just now it required changing twenty files.

Doing This Without Annoying Customers

I’m not too concerned about the economic impact of this change: the A/B test is mostly to show me whether it is modestly positive or extraordinarily positive.  What has kept me from doing it for the last six months is the worry that it would inconvenience customers who already use the downloadable version.  As a result, I took some precautions:

The downloadable version isn’t strictly speaking EOLed.  I’ll still happily support existing customers, and will keep it around in case folks want to download it again.  (I don’t plan on releasing any more versions of it, though.  In addition to being written in Java, a language I have no desire to use in a professional capacity anymore, the program is a huge mass of technical debt.  The features I’d most keenly like to add would require close to a whole rewrite of the most complex part of the program… and wouldn’t generate anywhere near an uptick in conversion large enough to make that a worthwhile use of my time, compared to improving the website, web version, or working on other products like Appointment Reminder.

I extended A/Bingo (my A/B testing framework) to give a way to override the A/B test choices for individual users.  I then used this capability to intentionally exclude from the A/B test (i.e. show the original site and not count) folks who hit a variety of heuristics suggesting that they probably already used the downloadable version.  One obvious one is that they’re accessing the site from the downloadable version.  There is also a prominent link in the FAQ explaining where it went, and clicking a button there will show it.  I also have a URL I can send folks to via email to accomplish the same thing, which was built with customer support in mind.

I also scheduled this test to start during the dog days of summer.  Seasonally, my sales always massively crater during the summer, which makes it a great time to spring big changes (like, e.g., new web applications).  Most of my customers won’t be using the software again until August, and that gives me a couple of months to get any hinks out of the system prior to them being seen by the majority of my user base.

My Big, Audacious Goal For This Test

I get about three (web) signups for every two downloads currently, and signups convert about twice as well as downloads do.  (Checking my math, that would imply a 3:1 ratio of sales, which is roughly what I see.)  If I was able to convert substantially all downloads to signups, I would expect to see sales increase by about 25%.

There are a couple of follow-on effects that would have:

  • I think offering two choices probably confuses customers and decreases the total conversion rate.  Eliminating one might help.
  • Consolidating offerings means that work to improve conversion rates automatically helps all prospects, rather than just 60%.

Magic Synergy Of Conversion Optimization And AdWords

Large systemic increases in conversion rates let me walk up AdWords bids.  For example, I use Conversion Optimizer.  Essentially, rather than bidding on a cost per click basis I tell Google how much I’m willing to pay for a signup or trial download.  I tell them 40 cents, with the intention of them actually getting the average at around 30 cents, which implies (given my conversion from trials/signups to purchase) that I pay somewhere around $12 to $15 for each $30 sale.  Working back from 30 cents through my landing page conversion rate, it turns out I pay about 6 cents per click.

Now, assuming my landing page conversion is relatively constant but my trial to sale conversion goes up by 25%, instead of paying $12 to $15 a sale I’d be paying $9.60 to $12 a sale.  I could just pocket the extra money, but rather than doing that, I’m probably going to tell Google “Alright, new deal: I’ll pay you up to 60 cents a trial”, actually end up paying about 40 cents, and end up paying about 8 cents per click.  The difference between 6 and 8 will convince Google to show my ads more often than those of some competitors, increasing the number of trials I get per month out of them.  (And, not coincidentally, my AdWords bill.  Darn, that is a bloody brilliant business model, where they extract rent every time I do hard work.  Oh well, I still get money, too.)

We’ll see if this works or not.  As always, I’ll be posting about it on my blog.  I’m highly interested in both the numerical results of the A/B test as well as whether this turns out being a win-win for my customers and myself or whether it will cause confusion at the margin.  I’m hoping not, but can’t allow myself to stay married to all old decisions just out of a desire to be consistent.