Archive | Uncategorized RSS feed for this section

Productizing Twilio Applications

This post includes video, slides, and a full-text writeup. I recommend bookmarking it if you’re on an iPhone right now.

I make extensive use of Twilio (a platform company that lets you do telephony with an API) in running Appointment Reminder, my core product focus at the moment.  (Wait around a day or two and I’ll tell you a bit about how it is doing in my annual end-of-the-year wrapup.)

Twilio has a very passionate developer community and fairly good documentation on their website, but I’ve sometimes been frustrated at it, because it teaches you the bare minimum to get phones ringing.  That is truly a wonderful thing and a necessary step to building a telephony application.  However, if you continue developing your application in the way that the Quick Start guides suggest, you will routinely run into problems regarding testing your code, maintaining it, securing it, and generally providing the best possible experience to your customers and the people they are calling.

I have a wee bit more than a year of practical experience with a Twilio application in production, so I went to TwilioConf and did a presentation about how to “productize” Twilio applications: to take them past the “cool weekend hack” stage and make them production-ready.  Twilio has graciously released videos of many of the presentations at TwilioConf, so I thought I’d write up my presentation for the benefit of folks who were not at the conference.

The Video (~30 Minutes)

Twilio Conference 2011: Patrick McKenzie – Productizing Twilio Apps from Twilio on Vimeo.

The Presentation (40 Slides)

The Writeup

 

Why I Think Twilio Will Take Over The World

(This was not actually in the presentation, because I didn’t have enough time for it, but I sincerely believe it and want to publish it somewhere.)

I think Twilio is, far and away, the most exciting technology I’ve ever worked with.  The world needs cat photos, local coupons, and mobifotosocial games, too, but it needs good telephony systems a lot more, as evidenced by companies paying billions of dollars for them.

Additionally, Twilio is the nascent, embryonic form of the first Internet that a billion people are going to have access to, because Twilio turns every phone into a smartphone.  The end-game for Zynga’s take-over-the-world vision is the human race slaved to artificial dopamine treadmills.  The endgame for Twilio’s vision is that every $2 handset in Africa is the moral equivalent of an iPhone.  I know which future I want to support.

Smartphones aren’t smart because of anything on the phones themselves, they’re smart because they speak HTTP and thus get always-on access to a universe of applications which are improving constantly.  Twilio radically reduces the amount of hardware support a phone needs to speak HTTP — it retroactively upgrades every phone in the world to do so.  After that, all you need is the application logic.  And what application logic there is — because the applications live on web servers, they have access to all the wonderful infrastructure built to run the Internet, from APIs that let you get Highly Consequential Data like e.g. weather reports or stock prices or whatever, to easy integration with systems which were never built to have a phone operating as part of them.

You can’t swing a stick in a business without hitting a problem which a phone application makes great sense for.  I filled up three pages of a notebook with them in just a week after being exposed to Twilio for the first time.  Order status checking for phone/fax/mail orders.  Integrated CRMs for phone customer service representatives.  Flight information.  Bank balances.  Server monitoring.  Appointment reminders.  Restaurant reservations.  Local search.  Loyalty programs.  Time card systems.  Retail/service employee support systems.  Shift management.  The list goes on and on and on.

Seriously, start writing Twilio apps.

What This Presentation Will Actually Cover

I’m tremendously optimistic about the futures of Twilio and the eventual futures of companies which make Twilio applications, but I’m pessimistic about your immediate future as an engineer writing a Twilio app, because it is going to be filled with pain.  You’re probably going to make some choices which will cause you and your customers intense amounts of suffering.  I’ve already done several of them, so use me as the inoculatory cowpox and avoid dying.

Crying In A Cold, Dark Room

Back in February 2011, I moved from my previous apartment to my current house.  I unwisely decided to push a trivial code change prior to boxing things up.  This trivial code change did not immediately take down the server, but did cause one component (queue worker processes) to fail some hours later.  The most immediate consequence of this was that outgoing appointment reminder phone calls / SMSes / emails failed to go out.  Since I was busy moving, I did not notice the automated messages about this failure.

When I discovered the failure (8 hours into customer-visible downtime), I panicked.  Rather than reasoning through what had happened, I reverted the code change and pushed reset on the queue worker processes.  This worked, and the queue quickly went from 2,000 pending jobs to 0 pending jobs.  I then went to bed.

At roughly 3 AM, I woke up with a vague feeling of unease about how I had handled the situation, and checked my email.  My customers (small businesses using AR to talk to their clients) had left several incensed messages about how their client had reported receiving dozens of unceasing calls on the behalf of their business, in a row, at 7:30 PM the night before (right after I had restarted the queue workers).

Here was the error: my application assumed that the queue would always be almost clear, because queue workers operate continuously.  A cron job was checking the DB every 5 minutes to see whether a particular client had been contacted about her appointment yet.  If she hadn’t, the cron job pushed another job to the queue to make the phone call / SMS / email.  When the queue came back up, each client received approximately ~100 queued events simultaneously.  These did not themselves check, at the start of the job, whether the job was still valid, because the application assumed that the cron job would only schedule valid reminder requests and not execute 100 times in between queue clearings.

This resulted in approximately 15 people receiving a total of 600 phone calls, 400 SMSes, and 200 emails, in approximately a 5 minute period of time.

There are a variety of ways I could have avoided causing this problem for my customers:

  1. Don’t make code changes prior to planned unavailability, even if they look trivial.
  2. Don’t ever leave your phone that gets emergency messages out of your pocket.
  3. Switch to idempotent queues, so that adding the same job multiple times does not result in multiple copies of the job.
  4. Add per-client rate limits, so that application logic errors don’t cause runaway contact attempts.
  5. Add failsafes for historically unprecedented levels of activity, shutting down the system until I could manually check it for correctness.

Testing Twilio Applications

Unit testing and integration testing are virtually required to produce production-quality Twilio applications, and will make it much less likely for you to create catastrophically bad bugs in production.  Unfortunately, testing Twilio applications is much harder than testing traditional CRUD web applications, because of how TWIML is different than HTML (in terms of how minor syntax errors actually cause business problems), how it is not easy to replicate telephone operation in integration testing, and because Twilio sometimes has poor separation of concerns between the MVC of a web application, the Twilio helper library, and the Twilio service itself.

Twilio testing is inherently dangerous, because non-production environments (testing, staging, development, etc) could conceivably generate actual, real-world phone calls to phone numbers which were in your database but not actually under your control.  The first and most important tip I have for Twilio testing is to make it explicitly impossible to contact anyone not on a whitelist from code when you’re not in production.  I have a quick snippet that I put in a Rails initializer which monkeypatches my Twilio library to force it to only make phone calls or SMSes to whitelisted numbers.  (I don’t suggest actually re-using this code, particularly as you may not be using Rails or the same Twilio library that I am using, but you can reuse the idea of enforcing safety in non-production environments.)

 

 

A lot of Twilio testing will, unfortunately, require manual button-pressing (or scripts which simulate button-pressing on a telephone).  This is easier to accomplish if you can expose your local development machine to the actual Internet.  There are strong security reasons why you don’t want to do this but, if you’re comfortable with doing it, LocalTunnel is a great way to actually accomplish it.

Also see the section below on Modeling Phone Calls, because it will make Twilio phone trees and call logic much more tractable to unit testing.

You Should Have A Staging Server

A staging server is just a copy of the entire production system minus the actual customers.  (You probably shouldn’t put production data on it, because staging systems are designed to break and as a result they may leak data through e.g. SQL injections.  This is an easy way to lose your DB.)  You should use firewalls and/or server rules to make the staging server inaccessible to the world (aside from Twilio and any other APIs which need to access your site for it to work), but assume you will botch this.

Staging servers are virtually mandatory for Twilio applications, because Twilio apps can fail in ways which will not be detected until they are actually accessed over the Internet.  For example, even with unit and integration testing, failing to properly deploy all audio assets (MP3 files, etc) will cause Twilio to throw hard, customer-visible errors in production.  I have automated systems which check for this now, but since that isn’t an exhaustive list of things that can go wrong in production, part of my workflow for deploying all changes on Twilio is to push them to the staging server first, and then having automated scripts exercise the core functionality of the application and ensure that it continues to work.

How To Model Phone Calls

Twilio Quick Start guides generally don’t suggest modeling phone calls explicitly, instead relying on just taking user input and doing if/then or switch statements on it.  This is ineffective for non-trivial use cases, because as the application logic gets more complicated, it will tend to accumulate lots of technical debt, be hard to visually verify for correctness, and be extremely difficult to automatically test.  Instead, you should model Twilio calls as state machines.  I am a big fan of state_machine in the Ruby world.

I’ll skip the CS201 description of what a state machine actually is.  If you didn’t take that course, Google is your friend.

You should model calls such that they start in a predictable state and user input moves the call between states, causing side effects of a) running any business logic required and b) outputting Twiml to Twilio, to continue driving the call.  This lets you replace case statements with a lot of parallel structure with well-defined transition tables within the call models.  Those models are then trivial to unit test.  Additionally, adopting coding conventions such as “the Twiml to be executed at a given state is always state_name.xml and any audio assets go in /foo/bar/state_name/*.mp3 “allows you to write trivial code which will test for their presence, which will save you from having to manually go through the entire phone tree every time on the staging server to verify that refactoring didn’t break anything.

Additionally, state machines are much easier to reason over than masses of spaghetti code which case statements tend to produce.  For example, consider the following code, which attempts to implement the phone prompt “Press 1 to confirm your appointment, press 2 to cancel your appointment, press 3 to ask for us to contact you about your appointment.”  Spot the bug.

There are actually over six bugs in that code, above the trivial ones you probably saw with numbers not lining up to action names:

  • The Twilio API will pass this code params[:Digits] not params[:digits], which will cause an error that won’t be caught until you physically pick up the phone.
  • The comparisons of params[:digits] with integers will fail, because it includes string representations of numbers.
  • There are several mistakes in mapping numbers to actions.
  • One of the action names is spelled improperly.

These are very easy to miss because our brains get lulled into a false sense of security by parallel structure.  Instead, the model should be taking care  of that mapping between user input and state transitions.  This would radically simplify the code and make the controller virtually failure-free, while letting the model exhaustively unit-test possible user input, expected transitions, and business logic side effects.

State machines might seem like an unnecessary complication when you only have three branches in your code, but production Twilio applications can get very, very complicated.  Here is a state diagram from Appointment Reminder.  You do not want to have to test these transitions manually!

Dealing With Answering Machines

Dealing with the case where the phone calls is answered by an answering machine or voicemail system has been the hardest application design problem for me in doing outgoing phone calls in Twilio.  The documentation suggests using an IfMachine feature, which will cause Twilio to listen to a few seconds of the phone call prior to executing your code.  They do some opaque AI magic to determine whether the entity speaking (or not speaking) in that interval is a machine or not, and tell your application whether it is talking to a machine or a human.  In my experience, this has error rates in the 20% region, and many customers intensely dislike the gap of dead air at the start of their phone calls.  Also, if the heuristic improperly detects the beep, your message will start playing early, causing the recording to be cut off in the middle.

There are several ways you could attempt to deal with this:

  • Ignore the issue and treat both machines and humans the same.  This will produce the optimal result for humans, but your system will be virtually unusable when it gets a machine.  (This happens very frequently in my use case.)
  • Force a keypress (“1″) prior to playing your message, then give all users the same message.  This will force most machines to start recording immediately, stopping the cut-off-in-the-middle problem but annoying some clients.
  • Play instructions such as “This is an automated message from $COMPANY.  To hear it, press 1.”  Assume that anyone who doesn’t press 1 in 5 seconds is a machine and play the machine message.  If they interact with the call, play the human message.  This is my preferred solution (although not actually implemented in AR publicly yet, because customers don’t really grok this issue until it bites them personally).

There is one particular problem with recording messages on answering machines: if you give a user instructions such as “Press 1 to confirm your message” and they follow that instruction when listening to their voicemail, that keypress will not be caught by your application, it will be caught by their voicemail system, with unpredictable results (such as deleting the message) and an absolute certainty of not doing what your keypress would normally do.  Users do not understand why this happens.  They expect your instructions to them to work.

Securing Twilio Applications

Twilio applications have a superset of the security issues of web applications.  In addition to the usual SQL injections / XSS issues / etc, use of the telephone has unique security issues associated with it.

One issue is that confidential information is only confidential until you repeat it into a telephone.  Even assuming that the phone call isn’t intercepted (which is, ahem, problematic), there are very common user errors and use cases which will cause that information to be disclosed to third parties.  For example:

  • User error in inputting telephone numbers causes the message to go to the wrong party.
  • The message goes to corporate voicemail, where it will routinely be accessible to third parties.
  • The message is played over a speakerphone / cell phone / etc within earshot of third parties.
  • The message is saved on a physical device which can predictably leave the physical control of authorized parties.
  • etc, etc

Don’t ever put confidential information into an outgoing message, unless you have an automated way to authenticate who you are speaking with.

For incoming phone calls, Caller ID is not sufficient authentication.  It can be trivially spoofed, indeed, your phone company will probably sell you a product whose sole aim is to spoof Caller ID.  Instead, you should use a circumstance where the user is already authenticated and authorized, such as a face-to-face meeting or using a username / password pair in a web application, and then give them one-time PINs to do whatever they need to do on your system.  Alternatively, you can implement an entire password system for your incoming phone calls, but users tend to hate them, so I try to keep to the one-time PIN metaphor.  (When a user does something on the AR site which requires calling the system, such as setting up a recording for a reminder, I tell them “Call 555-555-5555 and put in your Task Code 1234″, which (since it is time-sensitive) both helps me look up what they were doing on a multi-user system and also conclusively demonstrates that they were able to read a web page which already verified their identity.

Not in the presentation because the slide got deleted for some reason: the 4chan rule.  Even if your free trial is discovered by 4chan, the world should not become a darker, more evil place.  There exists tremendous possibilities for abuse of free-form input/output to people’s telephones.  I gate access to my trial by requiring a valid credit card, and demonstration calls and the like have strict rate limits which prevent them from being used to spam someone’s phone to death.  (I should also make it impossible to send demo calls outside of standard work hours.  This is easy to say but a little tricky to implement across multiple time zones while still encouraging legitimate use of demo calls, which is why I haven’t done it yet.)

Twilio Scales Impressively

Twilio and modern web technologies scale impressively well by the standards of traditional businesses.  However, you should probably continue to rate-limit your systems, even though you could theoretically do substantially more volume.  For example, many customers who ask about scaling issues do not sufficiently understand that your application scales several orders of magnitude better than their business processes.  For example, a prospective client asked if my system could handle 10,000 phone calls a month.  I told them that I could handle that in under an hour.  They were quite excited about that, but as we continued to speak about their needs, it developed that actually doing that would have crushed their business.  They would have made 10,000 phone calls in an hour, received over 1,000 callbacks, and their two full-time telephone operators would have been overwhelmed by incoming demand for their time.

Grab Bag of Random Advice

  • Never contact Twilio, or any external API, inside the HTTP request/response cycle.  Doing so imposes an unacceptable delay in performance and slaves your reliability to that of the worst performing API you use.  (Twilio has never had user-visible downtime, but some APIs I rely on have.) Queue the request and tell the browser that you’ve done so.  You can drizzle AJAX magic on your website to make this feel responsive for your users.
  • The Twilio Say verb will have a robot read your message.  This is adequate for development, but for production, people prefer listening to people.  Fiverr.com is great for finding voice actresses for $5.
  • You can’t record too much information about Twilio requests, responses, and errors.  I stuff everything in Redis these days.  I strongly wish I had started doing this earlier, rather than writing “An error happened” to a log file and being unable to determine exactly what the error was or easily figured out whose account it actually affected.
  • When in doubt, don’t make that phone call.  Design your system to fail closed.  This is a continuous discipline, but it will drastically cut down on catastrophic problems.

Wrapup

That’s it for the presentation contents.  I remain very interested in Twilio apps, and am happy to talk to you about them whenever. My contact details are trivially discoverable.

I’m going to attempt to write a more comprehensive guide to developing Twilio applications, eventually. We’ll see what form that takes — I would really like to provide people an (even) easier way to get started, but at the same time I can’t justify dropping two months of my schedule to write a traditional book on it.

I Saw An Extremely Subtle Bug Today And I Just Have To Tell Someone

This post will not help you sell more software. If you’re not fascinated by the inner workings of complex systems, go do something more important. If you are, grab some popcorn, because this is the best bug I’ve seen in years.

Have you ever been logged into a site and get suddenly asked to log in again?  This is one of those minor nuisances of using the Internet, right?  If you’re not technically inclined, you think “Hmm, ghosts in the Googles.  Oh well, here is my username and password again.”  If you are technically inclined, you might think “Hmm, funny, my cookie picked a bad time to expire, or maybe my session was getting stored in Memcached on a machine which just went down.  Oh well, here is my username and password again.”

It turns out that Bingo Card Creator has been doing this pervasively to a fraction of my users for the last few months.  I never noticed it, and no one ever complained.

Here’s the scenario: Bingo Card Creator is a Rails application, originally coded against Rails 2.1.X and then gradually updated with Rails security releases.  Like many Rails applications, it stores sessions in a cookie (using CookieStore), and uses the session to hold only very limited data.  Specifically, it holds the (critical) user ID for logged in users and the (nice to have) pre-login session ID.  I use the pre-login session ID to tie some analytics stuff together on the back end — basically, it lets me associate newly created accounts with search terms and whatnot that bring them to my site.  The exact mechanism for doing that isn’t important to this bug — you just need to understand that the session resetting is a minor nuisance if it only happens once in a blue moon, and a huge nuisance if it happens pervasively.

Subtle Indications My Site Was Borked

BCC maintains a whole lot of internal analytics, because I’m something of stats junkie.  Because BCC is in maintenance mode this year, I don’t actually view the stats on a regular basis — as long as the server stays up and users don’t have any complaints, I let the sleeping dog lie.  (I’ve been busy with other projects.)  Anyhow, one example of such a stat is “Of recently created trial accounts, how many were referred from the Halloween bingo cards mini-site?”  For most of the year, that should be a negligible number.

Except right about on Halloween, when the mini-site sees on the order of 30,000 visits or more.  This usually sells several thousand dollars worth of software.  That is fairly hard to miss, because if several thousand dollars don’t show up in my bank account, I’d know right away.  (Sidenote: I did lose about $1,000 due to an ill-timed server crash while I was on a cross-continental plane ride right during the middle of the pre-Halloween rush. Oof.)  So naturally, several thousand dollars implies a hundred or more sales (at $30 each) which implies thousands of trials, right?

Well, my internal analytics code was telling me that the Halloween site had referred ~100 trials of which 6 converted.   Which means that I should have expected a $200 bump in my bank balance.  Which was not what happened.

I mentally filed this away under “Hmm, that’s odd” but didn’t investigate immediately because I had not lost any money (or so I thought) and was busy that week.  Then recently, after doing an unrelated code push (I integrated Stripe, it is awesome, full details later), I did my usual post-deploy smoke test and, after creating a new account, I got suddenly logged out of the application.

“Hmm, that’s odd.”  And I tried it again, twice, couldn’t produce the error, and mentally wrote it off to gremlins.

In Which I Become Doubtful Of The Existence Of Gremlins

Four hours ago, my brain suddenly decided to put these facts together. The discrepancy for the sales statistics strongly suggests that, prior to accounts getting created, the session was getting cleared.  This meant that, when the account actually got created, the referrer was not associated with the account in the DB, which threw off subsequent stats gathered by my internal analytics.  Sessions getting randomly cleared would also cause the user to become instantly signed out.

I tried to reproduce the problem in development mode and was pervasively unable to do so.  Then I started trying to reproduce it on the live site and was able to, sporadically, but only in Incognito Mode in Chrome, and only if I clicked fairly fast.  (Don’t ask how many dozens of tries it took to figure out that fast clicking was the culprit.)

Having verified that it actually existed, I added instrumentation to tell me what my session ID was, and noticed — like expected — that it changed when I was suddenly logged out.  Sure enough, the session was getting wiped.  But why?

Racking my brains to figure out “What could reset a session in Rails other than explicitly trying to do it?”, I started listing up and discarding some candidates:

  • The cookie expired in the browser — nope, expiry was set correctly
  • The cookie got eaten by the hop from Nginx to Mongrel — nope, after investigation, cookies always matched on both sides (like expected)
  • The cookie got too big and failed to serialize properly — nope, after looking through the Rails codebase, that looked like it would throw an exception
  • The cookie got reset when Rails detected malicious behavior coming from the browser — bingo!

CSRF Protection: When It Breaks, It Breaks Very Quietly

Cross-site request forgery (CSRF) is tricking the browser with a malicious (or compromised) site B to access something on site A.  Since requests for site A will carry A’s cookie whether requested by A or not, an image tag or embedded Javascript on B can do anything on A that a logged-in user can do, like accessing /accounts/wire_all_your_money_to_switzerland with the appropriate POST parameters to make it happen.  This is, to put it mildly, a bad thing.  Rails has lovely magic which defends against CSRF for you: all you have to do is include two lines of code
#In application_controller.rb
protect_from_forgery

#In your templates' HEAD somewhere

Rails will then basically generate cryptographically secure random number, totally transparently to you. This is called the CSRF token.

One copy goes in your Rails session, where only your server and the client can see it.  (n.b. Rails sessions are a bit opaque since they are Base64 encoded, but they can be trivially decoded by anyone who can read the cookie, including the end-user.  They can’t be forged because of another security feature, but don’t put anything you don’t want the user to know in the session.)

Another copy goes in the document’s HEAD (for access via Javascript) and in Rails-generated forms as a hidden value.  When Rails makes a PUT or POST request to the server (via helper-generated form or helper-generated Javascript), Rails will submit the copy included in the HTML code with the request, compare it to the one in the session, and bounce requests where they don’t match. Bad actions on other sites shouldn’t be able to read either a) a page on your site (the same origin policy prevents this) b) the contents of your cookie from your site, so this is secure.

The specifics of how it “bounces requests” are very important.

Point Releases Sometimes Contain Doozies

My personal understanding of Rails up until an hour ago was that a CSRF violation would raise an exception.  This would practically never get seen by a legitimate user operation, so few people are aware of that, but I had seen it a time or two when security auditing BCC.  (Some of my geeky friends had, back in the day, exploited BCC with a CSRF and helpfully told me how to fix it.  Naturally, after fixing it I verified that the site worked as expected with the fix.)

So if the CSRF protection was somehow eating sessions, I would expect to see that exception getting logged and emailed to me by Airbrake (formerly Hoptoad — it emails you when an exception happens in production, highly recommended).   That wasn’t happening.

Then I decided to dig into the Rails source.  Whereupon I learned that Rails 2.3.11 changed the behavior of CSRF protection: instead of throwing exceptions, it would silently just clear the session and re-run the request.  For most sensitive operations (e.g. those which require a signed in user), this would force a signout and then any potentially damaging operation would be averted.

Here’s the relevant code in Rails 2.3.11:

def verify_authenticity_token
  verified_request? || handle_unverified_request
end

def handle_unverified_request
  reset_session
end

Versus the relevant code in Rails 2.3.10 (sidenote: you can see all of this easily in Github because Rails is diligent about tagging releases, a practice you should certainly follow in your own development):

def verify_authenticity_token
  verified_request? || raise(ActionController::InvalidAuthenticityToken)
end

And, sure enough, checking Subversion showed that I upgraded the version of Rails I was using in January of this year in response to this security advisory. I read that, made the required modifications to my application, tested, and had no problems.

So What Went Wrong Then?

After I was sure that sessions were being reset (but only in production), I added a bit of instrumentation to the live site to record the session ID for people coming from my IP address and to log when it changed. This let me find the culprit: a bit of Javascript that A/Bingo, my A/B testing library, uses to verify that people are human. It assumes that robots generally won’t run Javascript engines capable of doing POST requests, so it does an ajax-y POST to my server to assert humanity of the end-user, thus keeping almost all bots out of my stats.

That code has been live over a year. Why did it suddenly start causing session resets? Oh, another change in the 2.3.11 upgrade:

The old code:

  # Returns true or false if a request is verified.
  # Comment truncated by Patrick
  def verified_request?
      !protect_against_forgery?     ||
        request.method == :get      ||
        request.xhr?                ||
        !verifiable_request_format? ||
        form_authenticity_token == form_authenticity_param
  end

Notice that request.xhr? will cause this request to be verified if it evaluates to true, regardless of the other things in the OR statements. request.xhr? tests whether a request is ajax-y in nature. A/Bingo’s humanity-verifying POST is, so it didn’t trigger the CSRF check.

The new code, however:

  # Returns true or false if a request is verified.
  # Comment truncated by Patrick
  def verified_request?
    !protect_against_forgery?                            ||
      request.get?                                       ||
      form_authenticity_token == form_authenticity_param ||
      form_authenticity_token == request.headers['X-CSRF-Token']
  end

Yep, as announced in the patch notes, we lost the exemption for XHR requests. So the A/Bingo mark_human request will, because it makes no particular effort to include a CSRF token (which I will be changing very quickly, as A/Bingo is my project), with certainty cause the CSRF check to fail in 2.3.11. This will result in not a noisy exception (the previous behavior) but instead a silent reset followed by re-running the action. A/Bingo, which doesn’t care a whit whether you’re logged in, will then mark your freshly new session as human. If the previous contents of your session mattered, for example to keep you signed in, they are now gone. A/Bingo will not reaudit your humanity, though, because your session now marks you as human, so this will only ever happen to your browser once.

Race Conditions: Not Just In Java Thread Programming

So why did this never show up in development and why did it show up only sporadically in production? Well, consider how a browser interprets a page presented to it: it first downloads the HTML, then downloads the assets, blocking when it discovers e.g. CSS or Javascript which alters the document. This means that Javascript very low on a page may never execute if someone above it blocks them until the user navigates away. (This is a pretty gross simplification of how multiple pieces of very complicated and often incompatible software do something very difficult. If you want details, read stuff by the people behind YSlow. They’re geniuses and taught me all that I successfully learned about this process.) Someone like, say, external analytics utilities loaded over the public Internet. My page includes a few such scripts, like Google Analytics and CrazyEgg. They are off in development to avoid polluting my stats.

This plus the lack of network latency means that, on development, a browser which sees a page that includes the humanity testing Javascript will almost certainly execute it. That will cause the session to be burned, once, on the first page load. Since my invariable workflow for manual testing is “Start with a new browser at the homepage or landing page, do stuff, make sure it works”, the order of execution is:

  1. I load the front page or a landing page. The session is initialized to some value S1.
  2. (A few milliseconds later.) The A/Bingo Javascript checks for my humanity, resetting the session to some new value S2.
  3. I hit the registration or login button, and the site works as I expect it to.
  4. Since the site knows I am human now, that never gets checked again, and the session never gets reset again.

In production though, the workflow could very well be:

  1. The user arrives at the front page or landing page. The session is initialized to some value S1, including (say) their referrer information.
  2. A bunch of Javascript starts loading ahead of the A/Bingo check.
  3. The user, within one or two seconds (depending on network latency to those external scripts), either logs in or creates an account.
  4. The browser never successfully executes the A/Bingo check.
  5. The user arrives at their dashboard. When it is rendered, the server (robotically) decides it isn’t quite sure if they are human yet, and includes that Javascript again. (This behavior is designed because I was aware of the timing issue, I just didn’t realize how it would shake out with the 2.3.11 upgrade.
  6. This time, the user ponders their dashboard enough for the A/Bingo Javascript to post successfully. This resets their session to some new value S2.
  7. The user clicks anything on the page, and (because S2 doesn’t include their logged in user ID) gets taken to a login screen.
  8. The user is now durably marked as human, so the A/Bingo check never fires again, preventing a second unceremonious logout.

This neatly explains the logged out users. How to explain the missing referrer information? Well, if the user is NOT very fast on the click on the first page, they’ll have their referrer cleared out of the session before they successfully signup. They’ll get marked as a human prior to creating their account, though, so they’ll never even notice the unceremonious logout. This is the behavior of the overwhelming bulk of new users, which is why the stats were getting comprehensively borked but almost no users thought to complain.

This difference in behavior based on the hidden interaction of two concurrent processes is called a race condition. Race conditions are why sane programmers don’t program with threads or, if they do, they use shared-nothing architecture and pass all communication between the threads through a message queue written by someone who knows what they are doing (if you have to ask, it isn’t you — seriously, multithreaded programming is hard). I haven’t seen a race condition in years, because the genre of web applications I write and their architectures makes me mostly immune to them. Well, I just got busted back to CS102. Sadly, the core lesson of CS102 hasn’t changed: reasoning through why race conditions happen is very hard.

Saved By Unsophisticated Users, Sort Of

Users returning after the session naturally expired (2 weeks) would go through the dance again, potentially getting asked to log in twice. However, it took most of them enough time to have the human check prior to finding where the Sign In button was, so the percentage of users who actually visibly saw the bug was fairly small. (I’m guessing, from a quick heuristic run on my log files, that it was below 1% of accounts. That’s the optimistic way to say it. The pessimistic way is to say that this bug negatively affected the better part of a thousand people, and probably cost me sales from some of them.)

Whose Fault Is This?

If my users are inconvenienced, it is my fault, always. I should have read the patch notes for 2.3.11 more diligently, to discover the very consequential line “In addition to altering the templates, an application’s javascript must be changed to send the token with Ajax requests.”, and I should have been more aware that there was a one-line Javascript method pulled in by a library (which I wrote, so that is no excuse) which was not automatically upgraded with the Rails helper methods.

I’m not sure if more diligent testing would have caught this. Race conditions are hard to diagnose, and while I might have caught it by including production levels of external Javascript in my development environment, the symptoms would only have been visible a fraction of the time anyhow, and in ways which didn’t visibly break the application most of the time. (Who checks their stats for the development version to make sure they’re sensible after implementing that function correctly the first time?)

What I really should have done about this is addressing it earlier, when I first got the inkling that there was some weird edge case which would cause a logged in user to become logged out. I futzed around with my configuration once or twice and saw the problem go away (because it was non-deterministic), but rather than futzing I should have figured out a complicated but reducible series of steps that would always cause the issue. That would have sent me down the right road for fixing it.

So How Do You Address This

Immediate term, a one-line patch turns off CSRF protection for the A/Bingo mark_human action, preventing it from accidentally resetting the session.

skip_filter :verify_authenticity_token, :only => :mark_human

I also added a note about this to the A/Bingo documentation. I’ll patch A/Bingo after I have enough brain cells left to do that in a way which won’t break anyone’s applications. After I patch A/Bingo, that work-around won’t be necessary.

Why’d You Write This Post?

Because, after hours spelunking in Firebug, my codebase, and the innards of obsolete version of Rails to understand what was happening, I had to tell somebody. Some people have water coolers. I have the Internet. Hopefully, someone in this wide world will find this discussion useful.

If you’re wondering what the day-to-day life of an engineer is like or why it’s so dang hard some of the time, this might be a good example (of the pathological case — the typical case is writing boring code which solves boring problems, like laying out a 5×5 grid on a bingo card and randomizing the word order). Bingo Card Creator is not terribly complicated software when compared to most applications, but it sits on top of other pieces of code (Rails, the web server the browser, the TCP/IP stack, the underlying OS, the hardware on both ends, etc) which collectively are orders of magnitude more complicated than any physical artifact ever created by the human race.

Most of the time that complexity is abstracted away from both the user and the developer, both as blissfully ignorant of the layers below as an ant walking on an aircraft carrier is ignorant of the depth of the ocean. But when a problem bubbles up and writing it off to gremlins isn’t getting the job done, you have to start looking at the lower levels of abstraction. That is rather harder than dealing with just the higher levels of abstraction. (Joel Spolsky has an article about this subject.)

Don't Call Yourself A Programmer, And Other Career Advice

If there was one course I could add to every engineering education, it wouldn’t involve compilers or gates or time complexity.  It would be Realities Of Your Industry 101, because we don’t teach them and this results in lots of unnecessary pain and suffering.  This post aspires to be README.txt for your career as a young engineer.  The goal is to make you happy, by filling in the gaps in your education regarding how the “real world” actually works.  It took me about ten years and a lot of suffering to figure out some of this, starting from “fairly bright engineer with low self-confidence and zero practical knowledge of business.”  I wouldn’t trust this as the definitive guide, but hopefully it will provide value over what your college Career Center isn’t telling you.

90% of programming jobs are in creating Line of Business software: Economics 101: the price for anything (including you) is a function of the supply of it and demand for it.  Let’s talk about the demand side first.  Most software is not sold in boxes, available on the Internet, or downloaded from the App Store.  Most software is boring one-off applications in corporations, under-girding every imaginable facet of the global economy.  It tracks expenses, it optimizes shipping costs, it assists the accounting department in preparing projections, it helps design new widgets, it prices insurance policies, it flags orders for manual review by the fraud department, etc etc.  Software solves business problems.  Software often solves business problems despite being soul-crushingly boring and of minimal technical complexity.  For example, consider an internal travel expense reporting form.  Across a company with 2,000 employees, that might save 5,000 man-hours a year (at an average fully-loaded cost of $50 an hour) versus handling expenses on paper, for a savings of $250,000 a year.  It does not matter to the company that the reporting form is the world’s simplest CRUD app, it only matters that it either saves the company costs or generates additional revenue.

There are companies which create software which actually gets used by customers, which describes almost everything that you probably think of when you think of software.  It is unlikely that you will work at one unless you work towards making this happen.  Even if you actually work at one, many of the programmers there do not work on customer-facing software, either.

Engineers are hired to create business value, not to program things:  Businesses do things for irrational and political reasons all the time (see below), but in the main they converge on doing things which increase revenue or reduce costs.  Status in well-run businesses generally is awarded to people who successfully take credit for doing one of these things.  (That can, but does not necessarily, entail actually doing them.)  The person who has decided to bring on one more engineer is not doing it because they love having a geek around the room, they are doing it because adding the geek allows them to complete a project (or projects) which will add revenue or decrease costs.  Producing beautiful software is not a goal.  Solving complex technical problems is not a goal.  Writing bug-free code is not a goal.  Using sexy programming languages is not a goal.  Add revenue.  Reduce costs.  Those are your only goals.

Peter Drucker — you haven’t heard of him, but he is a prophet among people who sign checks — came up with the terms Profit Center and Cost Center.  Profit Centers are the part of an organization that bring in the bacon: partners at law firms, sales at enterprise software companies, “masters of the universe” on Wall Street, etc etc.  Cost Centers are, well, everybody else.  You really want to be attached to Profit Centers because it will bring you higher wages, more respect, and greater opportunities for everything of value to you.  It isn’t hard: a bright high schooler, given a paragraph-long description of a business, can usually identify where the Profit Center is.  If you want to work there, work for that.  If you can’t, either a) work elsewhere or b) engineer your transfer after joining the company.

Engineers in particular are usually very highly paid Cost Centers, which sets MBA’s optimization antennae to twitching.  This is what brings us wonderful ideas like outsourcing, which is “Let’s replace really expensive Cost Centers who do some magic which we kinda need but don’t really care about with less expensive Cost Centers in a lower wage country”.  (Quick sidenote: You can absolutely ignore outsourcing as a career threat if you read the rest of this guide.)  Nobody ever outsources Profit Centers.  Attempting to do so would be the setup for MBA humor.  It’s like suggesting replacing your source control system with a bunch of copies maintained on floppy disks.

Don’t call yourself a programmer: “Programmer” sounds like “anomalously high-cost peon who types some mumbo-jumbo into some other mumbo-jumbo.”  If you call yourself a programmer, someone is already working on a way to get you fired.  You know Salesforce, widely perceived among engineers to be a Software as a Services company?  Their motto and sales point is “No Software”, which conveys to their actual customers “You know those programmers you have working on your internal systems?  If you used Salesforce, you could fire half of them and pocket part of the difference in your bonus.”  (There’s nothing wrong with this, by the way.  You’re in the business of unemploying people.  If you think that is unfair, go back to school and study something that doesn’t matter.)

Instead, describe yourself by what you have accomplished for previously employers vis-a-vis increasing revenues or reducing costs.  If you have not had the opportunity to do this yet, describe things which suggest you have the ability to increase revenue or reduce costs, or ideas to do so.

There are many varieties of well-paid professionals who sling code but do not describe themselves as slinging code for a living.  Quants on Wall Street are the first and best-known example: they use computers and math as a lever to make high-consequence decisions better and faster than an unaided human could, and the punchline to those decisions is “our firm make billions of dollars.”  Successful quants make more in bonuses in a good year than many equivalently talented engineers will earn in a decade or lifetime.

Similarly, even though you might think Google sounds like a programmer-friendly company, there are programmers and then there’s the people who are closely tied to 1% improvements in AdWords click-through rates.  (Hint: provably worth billions of dollars.)  I recently stumbled across a web-page from the guy whose professional bio is “wrote the backend billing code that 97% of Google’s revenue passes through.”  He’s now an angel investor (a polite synonym for “rich”).

You are not defined by your chosen software stack: I recently asked via Twitter what young engineers wanted to know about careers.  Many asked how to know what programming language or stack to study.  It doesn’t matter.  There you go.

Do Java programmers make more money than .NET programmers?  Anyone describing themselves as either a Java programmer or .NET programmer has already lost, because a) they’re a programmer (you’re not, see above) and b) they’re making themselves non-hireable for most programming jobs.  In the real world, picking up a new language takes a few weeks of effort and after 6 to 12 months nobody will ever notice you haven’t been doing that one for your entire career.  I did back-end Big Freaking Java Web Application development as recently as March 2010.  Trust me, nobody cares about that.  If a Python shop was looking for somebody technical to make them a pile of money, the fact that I’ve never written a line of Python would not get held against me.

Talented engineers are rare — vastly rarer than opportunities to use them — and it is a seller’s market for talent right now in almost every facet of the field.  Everybody at Matasano uses Ruby.  If you don’t, but are a good engineer, they’ll hire you anyway.  (A good engineer has a track record of — repeat after me — increasing revenue or decreasing costs.)  Much of Fog Creek uses the Microsoft Stack.  I can’t even spell ASP.NET and they’d still hire me.

There are companies with broken HR policies where lack of a buzzword means you won’t be selected.  You don’t want to work for them, but if you really do, you can add the relevant buzzword to your resume for the costs of a few nights and weekends, or by controlling technology choices at your current job in such a manner that in advances your career interests.  Want to get trained on Ruby at a .NET shop?  Implement a one-off project in Ruby.  Bam, you are now a professional Ruby programmer — you coded Ruby and you took money for it.  (You laugh?  I did this at a Java shop.  The one-off Ruby project made the company $30,000.  My boss was, predictably, quite happy and never even asked what produced the deliverable.)

Co-workers and bosses are not usually your friends: You will spend a lot of time with co-workers.  You may eventually become close friends with some of them, but in general, you will move on in three years and aside from maintaining cordial relations you will not go out of your way to invite them over to dinner.  They will treat you in exactly the same way.  You should be a good person to everyone you meet — it is the moral thing to do, and as a sidenote will really help your networking — but do not be under the delusion that everyone is your friend.

For example, at a job interview, even if you are talking to an affable 28 year old who feels like a slightly older version of you he is in a transaction.  You are not his friend, you are an input for an industrial process which he is trying to buy for the company at the lowest price.  That banter about World of Warcraft is just establishing a professional rapport, but he will (perfectly ethically) attempt to do things that none of your actual friends would ever do, like try to talk you down several thousand dollars in salary or guilt-trip you into spending more time with the company when you could be spending time with your actual friends.  You will have other coworkers who — affably and ethically — will suggest things which go against your interests, from “I should get credit for that project you just did” (probably not phrased in so many words) to “We should do this thing which advances my professional growth goals rather than yours.”  Don’t be surprised when this happens.

You radically overestimate the average skill of the competition because of the crowd you hang around with:  Many people already successfully employed as senior engineers cannot actually implement FizzBuzz.  Just read it and weep.  Key takeaway: you probably are good enough to work at that company you think you’re not good enough for.  They hire better mortals, but they still hire mortals.

“Read ad.  Send in resume.  Go to job interview.  Receive offer.” is the exception, not the typical case, for getting employment: Most jobs are never available publicly, just like most worthwhile candidates are not available publicly (see here).  Information about the position travels at approximately the speed of beer, sometimes lubricated by email.  The decisionmaker at a company knows he needs someone.  He tells his friends and business contacts.  One of them knows someone — family, a roommate from college, someone they met at a conference, an ex-colleague, whatever.  Introductions are made, a meeting happens, and they achieve agreement in principle on the job offer.  Then the resume/HR department/formal offer dance comes about.

This is disproportionately true of jobs you actually want to get.  “First employee at a successful startup” has a certain cachet for a lot of geeks, and virtually none of those got placed by sending in a cover letter to an HR department, in part because two-man startups don’t have enough scar tissue to form HR departments yet.  (P.S. You probably don’t want to be first employee for a startup.  Be the last co-founder instead.)  Want to get a job at Googler?  They have a formal process for giving you a leg up because a Googler likes you.  (They also have multiple informal ways for a Googler who likes you an awful lot to short-circuit that process.  One example: buy the company you work for.  When you have a couple of billion lying around you have many interesting options for solving problems.)

There are many reasons why most hiring happens privately.  One is that publicly visible job offers get spammed by hundreds of resumes (particularly in this economy) from people who are stunningly inappropriate for the position.  The other is that other companies are so bad at hiring that, if you don’t have close personal knowledge about the candidate, you might accidentally hire a non-FizzBuzzer.

Networking: it isn’t just for TCP packets: Networking just means a) meeting people who at some point can do things for you (or vice versa) and b) making a favorable impression on them.

There are many places to meet people.  Events in your industry, such as conferences or academic symposia which get seen by non-academics, are one.  User groups are another.  Keep in mind that user groups draw a very different crowd than industry conferences and optimize accordingly.

Strive to help people.  It is the right thing to do, and people are keenly aware of who have in the past given them or theirs favors.  If you ever can’t help someone but know someone who can, pass them to the appropriate person with a recommendation.  If you do this right, two people will be happy with you and favorably disposed to helping you out in the future.

You can meet people over the Internet (oh God, can you), but something in our monkey brains makes in-the-flesh meeting a bigger thing.  I’ve Internet-met a great many people who I’ve then gone on to meet in real life.  The physical handshake is a major step up in the relationship, even when Internet-meeting lead to very consequential things like “Made them a lot of money through good advice.”  Definitely blog and participate on your industry-appropriate watering holes like HN, but make it out to the meetups for it.

Academia is not like the real world: Your GPA largely doesn’t matter (modulo one high profile exception: a multinational advertising firm).  To the extent that it does matter, it only determines whether your resume gets selected for job interviews.  If you’re reading the rest of this, you know that your resume isn’t the primary way to get job interviews, so don’t spend huge amount of efforts optimizing something that you either have sufficiently optimized already (since you’ll get the same amount of interviews at 3.96 as you will at 3.8) or that you don’t need at all (since you’ll get job interviews because you’re competent at asking the right people to have coffee with you).

Your major and minor don’t matter.  Most decisionmakers in industry couldn’t tell the difference between a major in Computer Science and a major in Mathematics if they tried.  I was once reduced to tears because a minor academic snafu threatened my ability to get a Bachelor of Science with a major in Computer Science, which my advisor told me was more prestigious than a Bachelor of Science in Computer Science.  Academia cares about distinctions like that.  The real world does not.

Your professors might understand how the academic job market works (short story: it is ridiculously inefficient in engineering and fubared beyond mortal comprehension in English) but they often have quixotic understandings of how the real world works.  For example, they may push you to get extra degrees because a) it sounds like a good idea to them and b) they enjoy having research-producing peons who work for ramen.  Remember, market wages for people capable of producing research are $80~100k+++ in your field.  That buys an awful lot of ramen.

The prof in charge of my research project offered me a spot in his lab, a tuition waiver, and a whole $12,000 dollars as a stipend if I would commit 4~6 years to him.  That’s a great deal if, and only if, you have recently immigrated from a low-wage country and need someone to intervene with the government to get you a visa.

If you really like the atmosphere at universities, that is cool.  Put a backpack on and you can walk into any building at any university in the United States any time you want.  Backpacks are a lot cheaper than working in academia.   You can lead the life of the mind in industry, too — and enjoy less politics and better pay.  You can even get published in journals, if that floats your boat.  (After you’ve escaped the mind-warping miasma of academia, you might rightfully question whether Published In A Journal is really personally or societally significant as opposed to close approximations like Wrote A Blog Post And Showed It To Smart People.)

How much money do engineers make?

Wrong question.  The right question is “What kind of offers do engineers routinely work for?”, because salary is one of many levers that people can use to motivate you.  The answer to this is, less than helpfully, “Offers are all over the map.”

In general, big companies pay more (money, benefits, etc) than startups.  Engineers with high perceived value make more than those with low perceived value.  Senior engineers make more than junior engineers.  People working in high-cost areas make more than people in low-cost areas.  People who are skilled in negotiation make more than those who are not.

We have strong cultural training to not ask about salary, ever.  This is not universal.  In many cultures, professional contexts are a perfectly appropriate time to discuss money.  (If you were a middle class Japanese man, you could reasonably be expected to reveal your exact salary to a 2nd date, anyone from your soccer club, or the guy who makes your sushi.  If you owned a company, you’d probably be cagey about your net worth but you’d talk about employee salaries the way programmers talk about compilers — quite frequently, without being embarrassed.)   If I were a Marxist academic or a conspiracy theorist, I might think that this bit of middle class American culture was specifically engineered to be in the interests of employers and against the interests of employees.  Prior to a discussion of salary at any particular target employer, you should speak to someone who works there in a similar situation and ask about the salary range for the position.  It is <%= Date.today.year %>; you can find these people online.  (LinkedIn, Facebook, Twitter, and your (non-graph-database) social networks are all good to lean on.)

Anyhow.  Engineers are routinely offered a suite of benefits.  It is worth worrying, in the United States, about health insurance (traditionally, you get it and your employer foots most or all of the costs) and your retirement program, which is some variant of “we will match contributions to your 401k up to X% of salary.”  The value of that is easy to calculate: X% of salary.  (It is free money, so always max out your IRA up to the employer match.  Put it in index funds and forget about it for 40 years.)

There are other benefits like “free soda”, “catered lunches”, “free programming books”, etc.  These are social signals more than anything else.  When I say that I’m going to buy you soda, that says a specific thing about how I run my workplace, who I expect to work for me, and how I expect to treat them.  (It says “I like to move the behavior of unsophisticated young engineers by making this job seem fun by buying 20 cent cans of soda, saving myself tens of thousands in compensation while simultaneously encouraging them to ruin their health.”  And I like soda.)  Read social signals and react appropriately — someone who signals that, e.g., employee education is worth paying money for might very well be a great company to work for — but don’t give up huge amounts of compensation in return for perks that you could trivially buy.

How do I become better at negotiation?  This could be a post in itself.  Short version:

a)  Remember you’re selling the solution to a business need (raise revenue or decrease costs) rather than programming skill or your beautiful face.

b)  Negotiate aggressively with appropriate confidence, like the ethical professional you are.  It is what your counterparty is probably doing.  You’re aiming for a mutual beneficial offer, not for saying Yes every time they say something.

c)  “What is your previous salary?” is employer-speak for “Please give me reasons to pay you less money.”  Answer appropriately.

d)  Always have a counteroffer.  Be comfortable counteroffering around axes you care about other than money.  If they can’t go higher on salary then talk about vacation instead.

e)  The only time to ever discuss salary is after you have reached agreement in principle that they will hire you if you can strike a mutually beneficial deal.  This is late in the process after they have invested a lot of time and money in you, specifically, not at the interview.  Remember that there are large costs associated with them saying “No, we can’t make that work” and, appropriately, they will probably not scuttle the deal over comparatively small issues which matter quite a bit to you, like e.g. taking their offer and countering for that plus a few thousand bucks then sticking to it.

f)  Read a book.  Many have been written about negotiation.  I like Getting To Yes.  It is a little disconcerting that negotiation skills are worth thousands of dollars per year for your entire career but engineers think that directed effort to study them is crazy when that could be applied to trivialities about a technology that briefly caught their fancy.

How to value an equity grant:

Roll d100.  (Not the right kind of geek?  Sorry.  rand(100) then.)

0~70: Your equity grant is worth nothing.

71~94: Your equity grant is worth a lump sum of money which makes you about as much money as you gave up working for the startup, instead of working for a megacorp at a higher salary with better benefits.

95~99: Your equity grant is a lifechanging amount of money.  You won’t feel rich — you’re not the richest person you know, because many of the people you spent the last several years with are now richer than you by definition — but your family will never again give you grief for not having gone into $FAVORED_FIELD like a proper $YOUR_INGROUP.

100: You worked at the next Google, and are rich beyond the dreams of avarice.  Congratulations.

Perceptive readers will note that 100 does not actually show up on a d100 or rand(100).

Why are you so negative about equity grants?

Because you radically overestimate the likelihood that your startup will succeed and radically overestimate the portion of the pie that will be allocated to you if the startup succeeds.  Read about dilution and liquidation preferences on Hacker News or Venture Hacks, then remember that there are people who know more about negotiating deals than you know about programming and imagine what you could do to a program if there were several hundred million on the line.

Are startups great for your career as a fresh graduate?

The high-percentage outcome is you work really hard for the next couple of years, fail ingloriously, and then be jobless and looking to get into another startup.  If you really wanted to get into a startup two years out of school, you could also just go work at a megacorp for the next two years, earn a bit of money, then take your warchest, domain knowledge, and contacts and found one.

Working at a startup, you tend to meet people doing startups.  Most of them will not be able to hire you in two years.  Working at a large corporation, you tend to meet other people in large corporations in your area.  Many of them either will be able to hire you or will have the ear of someone able to hire you in two years.

So would you recommend working at a startup?  Working in a startup is a career path but, more than that, it is a lifestyle choice.  This is similar to working in investment banking or academia.  Those are three very different lifestyles.  Many people will attempt to sell you those lifestyles as being in your interests, for their own reasons.  If you genuinely would enjoy that lifestyle, go nuts.  If you only enjoy certain bits of it, remember that many things are available a la carte if you really want them.  For example, if you want to work on cutting-edge technology but also want to see your kids at 5:30 PM, you can work on cutting-edge technology at many, many, many megacorps.

(Yeah, really.  If it creates value for them, heck yes, they’ll invest in it.  They’ll also invest in a lot of CRUD apps, but then again, so do startups — they just market making CRUD apps better than most megacorps do.  The first hour of the Social Network is about making a CRUD app seem like sexy, the second is a Lifetime drama about a divorce improbably involving two heterosexual men.)

Your most important professional skill is communication: Remember engineers are not hired to create programs and how they are hired to create business value?  The dominant quality which gets you jobs is the ability to give people the perception that you will create value.  This is not necessarily coextensive with ability to create value.

Some of the best programmers I know are pathologically incapable of carrying on a conversation.  People disproportionately a) wouldn’t want to work with them or b) will underestimate their value-creation ability because they gain insight into that ability through conversation and the person just doesn’t implement that protocol.  Conversely, people routinely assume that I am among the best programmers they know entirely because a) there exists observable evidence that I can program and b) I write and speak really, really well.

(Once upon a time I would have described myself as “Slightly below average” in programming skill.  I have since learned that I had a radically skewed impression of the skill distribution, that programming skill is not what people actually optimize for, and that modesty is against my interests.  These days if you ask me how good of a programmer I am I will start telling you stories about how I have programmed systems which helped millions of kids learn to read or which provably made companies millions.  The question of where I am on the bell curve matters to no one, so why bother worrying about it?)

Communication is a skill.  Practice it: you will get better.  One key sub-skill is being able to quickly, concisely, and confidently explain how you create value to someone who is not an expert in your field and who does not have a priori reasons to love you.  If when you attempt to do this technical buzzwords keep coming up (“Reduced 99th percentile query times by 200 ms by optimizing indexes on…”), take them out and try again.  You should be able to explain what you do to a bright 8 year old, the CFO of your company, or a programmer in a different specialty, at whatever the appropriate level of abstraction is.

You will often be called to do Enterprise Sales and other stuff you got into engineering to avoid: Enterprise Sales is going into a corporation and trying to convince them to spend six or seven figures on buying a system which will either improve their revenue or reduce costs.  Every job interview you will ever have is Enterprise Sales.  Politics, relationships, and communication skills matter a heck of a lot, technical reality not quite so much.

When you have meetings with coworkers and are attempting to convince  them to implement your suggestions, you will also be doing Enterprise Sales.  If getting stuff done is your job description, then convincing people to get stuff done is a core job skill for you.  Spend appropriate effort on getting good at it.  This means being able to communicate effectively in memos, emails, conversations, meetings, and PowerPoint (when appropriate).  It means understanding how to make a business case for a technological initiative.  It means knowing that sometimes you will make technological sacrifices in pursuit of business objectives and that this is the right call.

Modesty is not a career-enhancing character trait: Many engineers have self-confidence issues (hello, self).  Many also come from upbringings where modesty with regards to one’s accomplishments is culturally celebrated.  American businesses largely do not value modesty about one’s accomplishments.  The right tone to aim for in interviews, interactions with other people, and life is closer to “restrained, confident professionalism.”

If you are part of a team effort and the team effort succeeds, the right note to hit is not “I owe it all to my team” unless your position is such that everyone will understand you are lying to be modest.  Try for “It was a privilege to assist my team by leading their efforts with regards to $YOUR_SPECIALTY.”  Say it in a mirror a thousand times until you can say it with a straight face.  You might feel like you’re overstating your accomplishments.  Screw that.  Someone who claims to Lead Efforts To Optimize Production while having the title Sandwich Artist is overstating their accomplishments.  You are an engineer.  You work magic which makes people’s lives better.  If you were in charge of the database specifically on an important project involving people then heck yes you lead the database effort which was crucial for the success of the project.  This is how the game is played.  If you feel poorly about it, you’re like a batter who feels poorly about stealing bases in baseball: you’re not morally superior, you’re just playing poorly

All business decisions are ultimately made by one or a handful of multi-cellular organisms closely related to chimpanzees, not by rules or by algorithms: People are people.  Social grooming is a really important skill.  People will often back suggestions by friends because they are friends, even when other suggestions might actually be better.  People will often be favoritably disposed to people they have broken bread with.  (There is a business book called Never Eat Alone.  It might be worth reading, but that title is whatever the antonym of deceptive advertising is.)  People routinely favor people who they think are like them over people they think are not like them.  (This can be good, neutral, or invidious.  Accepting that it happens is the first step to profitably exploiting it.)

Actual grooming is at least moderately important, too, because people are hilariously easy to hack by expedients such as dressing appropriately for the situation, maintaining a professional appearance, speaking in a confident tone of voice, etc.  Your business suit will probably cost about as much as a computer monitor.  You only need it once in a blue moon, but when you need it you’ll be really, really, really glad that you have it.  Take my word for it, if I wear everyday casual when I visit e.g. City Hall I get treated like a hapless awkward twenty-something, if I wear the suit I get treated like the CEO of a multinational company.  I’m actually the awkward twenty-something CEO of a multinational company, but I get to pick which side to emphasize when I want favorable treatment from a bureaucrat.

(People familiar with my business might object to me describing it as a multinational company because it is not what most people think of when “multinational company” gets used in conversation.  Sorry — it is a simple conversational hack.  If you think people are pissed off at being manipulated when they find that out, well, some people passionately hate business suits, too.  That doesn’t mean business suits are valueless.  Be appropriate to the circumstances.  Technically true answers are the best kind of answers when the alternative is Immigration deporting you, by the way.)

At the end of the day, your life happiness will not be dominated by your career.  Either talk to older people or trust the social scientists who have: family, faith, hobbies, etc etc generally swamp career achievements and money in terms of things which actually produce happiness.  Optimize appropriately.  Your career is important, and right now it might seem like the most important thing in your life, but odds are that is not what you’ll believe forever.  Work to live, don’t live to work.

Speaking At Microconf — Free Ticket Inside

I met Rob Walling of Software By Rob at the Business of Software conference last year, after a couple years of swapping emails.  He and I hit it off, largely since we come from similar places on the “building small profitable software businesses” solution space.  So when he asked if I would fly around the world to speak at MicroConf, a conference he was organizing for small software business, I of course said yes.  It is June 6 and June 7th in Las Vegas, and tickets are still on sale.  (Special promo code: BINGO gets you $100 off.  I don’t get compensated for that.)

I gave away the one free ticket, but feel free to use the above promo code.  I also have a ticket to give away.  It is yours if

  • you have a small software business with an actual product which sells to actual people
  • you have sold at least one copy
  • you can get yourself to Vegas
  • you can find my address and email me explaining what you hope to get out of the conference

Offer good to one person, judging based solely on who I think would benefit most from it.

I have not written my speech yet, but intend to make it worthwhile for folks coming there, both in terms of motivation and in terms of teaching stuff they can actually use for their businesses.  I generally tend to talk marketing when it comes to that.

I’m currently kicking around an extended metaphor about icebergs for the talk.  You see, every business is an iceberg: of the value created by the business, much of it is hidden within the company or (at best) exposed to existing customers, and only a small portion peaks above the waterline, outside of the existing community around the business.  This is unfortunate, because both traditional marketing and SEO revolve around maximizing the visible bit of the iceberg.  There are practical ways to do that which work well for software businesses.  I will likely talk about several of them.

If you have any suggestions on things I should absolutely cover, I’d love to hear them in the comments.

If you come to Microconf, talk to me.  I know most of the other speakers and they’re all very personable people.  You should probably talk to them, too.  But this is an explicit invitation: talk to me.  I’m literally flying halfway around the world and I have no agenda item other than talking to you about your software business.  Ask me for advice.  I can’t promise it will be good advice, but I intend to give lots of it.  I’ll be the tall jet-lagged white guy in the bright red Twilio jacket. (<Plug>Twilio: it’s awesome, you should be using it.  Plus they make awesome red jackets.</Plug>)

Appointment Reminder at 6 Months

<Plug>

The guys at AppSumo approached me and said “Hey, we’d like to do a video of you talking strategy with Andrew Warner.  You guys script it, we’ll edit it and sell it.”  Ordinarily I don’t really do e-books and whatnot but that pitch had me at “Andrew”, because Mixergy is one of the best sources of consistently actionable advice I’ve seen, and I want to help him succeed in whatever little way I can.

The topic of the video is Scalable Content Generation.  It’s the same SEO strategy that I’ve talked about on my blog for years (see Greatest Hits section)  Take my word for it: that is the highest single expected ROI of anything I’ve ever talked about on this blog.  However, those posts are stream-of-consciousness notes from a strategy which evolved organically over years.  Many people tell me that the idea is wonderful, very few actually end up implementing it.

The video is scripted, professionally edited, and organized so that hopefully people will actually implement it this time.  Andrew and I walk you through exactly what I did to turn $3,000 of freelancer writing into $30,000 of sales last year, and discuss how to apply that to an arbitrary online business.

Here was my pitch to AppSumo for why they should have me talk on Scalable Content Generation:

  1. It lets small businesses achieve top rankings for relevant search terms on Google for minimal costs.
  2. It lets small businesses develop an asset which continues to grow in value over time, rather than leasing traffic via e.g. AdWords ads, where you have to continue paying or the spigot turns off.
  3. It allows you to provide huge amounts of actual value to customers without spending a huge amount of time on it.

The video is for sale over here.  Apparently there is an option for getting it free for the next 24 hours (on Friday April15th, 2011 US time) — after that it will be somewhere south of $100.  They’re also throwing in an hour of consultation with me for somebody who writes a review.

Ask me about my thoughts on e-books and info marketing some other time, but suffice it to say a) I am doing this for free, b) I would not have done it but for Andrew’s involvement, and c) I believe the video has value to online businesses or I wouldn’t be associated with it.

</Plug>

Appointment Reminder Update

Early in December I launched my second software business, Appointment Reminder.  I can’t be as open with it as I am with Bingo Card Creator (you can literally see my sales stats for that one) , but I hope to keep folks informed about how things are going.  Long story short: egads, I had forgotten how long it takes to get these things off the ground.

One would assume that, after leaving the 70 ~ 90 hours day job, I would be able to devote 100% of my concentration to the new business.   That turned out to be grossly over-optimistic: a combination of burnout, reacquainting myself with human life, and distractions from consulting meant that I accomplished almost nothing on AR between May and late October of last year.  Similarly, after launching I took the month off for Christmas, and when I got back in January I immediately started applying myself to marketing AR floundered around for quite a bit.  There was a consulting engagement, a few side projects (Achievement Unlocked: Published in Academic Journal), an earthquake, and now we’re almost to Easter and I’m wondering where 2011 has gone.

So that’s the bad news.  The good news:

Got Customers.

AR has had about fifty people sign up for the trial, either by doing it themselves on the website or by me giving them an account manually.  That’s much, much, much slower than BCC — these days, BCC routinely gets 250 signups a day.  The saving grace is that their conversion rates are high: keeping in mind that customers have a 30 day free trial and that many are still within it, about 10 of them have already paid me money and about 10 more look likely to.  The revenue run rate is still inconsequential (south of $500 a month), but AR is cash flow positive (pays for server, calls, credit card processing, etc), and the unit economics for those customers turned out better than expected.

For example, my most popular plan currently is the Professional one, at $29 a month.  This entitles the customer to up to 100 appointments a month.  The worse case scenario for cost to service that customer is about $20 a month paid to Twilio.  My hypothesis was that the actual cost to service the customer would be lower than $5 or so, which makes the economics attractive.  It turns out that most customers on that plan are below $3 apiece.  This means that, if I could just scale customer acquisition, I would be in a very happy place.

They Love It.

My customers have sent over a thousand reminders regarding 800 or so appointments.  It is anecdotally making a big impact for their businesses: my biggest fan has seen his no-show rate decline to virtually nothing, which singlehandedly “pays for the mortgage.”  Many other customers report that they didn’t previously have a problem with no-shows, but that making reminder calls was a source of frustration for them, and that AR removes the frustration and makes it much more likely that any given client actually gets contacted.

Somewhat surprisingly to me, my customers’ customers love Appointment Reminder, too.  My favorite: “I wish all my [service providers] used this.”  The context for me hearing that was that customer relaying his customers’ opinions to tell me why he wouldn’t stop using Appointment Reminder after getting bitten by a real doozy of a bug.

Bugs Suck

I very carefully avoided doing anything “Mission Critical” when I was an employee, because I didn’t feel like I could offer the requisite level of service.  BCC going down can inconvenience a teacher, but nobody is going to have their day totally ruined by it.

Appointment Reminder is more than capable of totally ruining somebody’s day.  If it just broke, that would be annoying but survivable: clients do not expect to get automated reminders yet from my customers, and most will come in for their appointments regardless of whether they get a reminder or not.  However, “failure to deliver reminders in a timely fashion” is not nearly the worst possible failure case.

An example: during my apartment move in February, due to an ill-considered code push the night before the move, the DelayedJob queue which handles (among other things) outgoing reminders fell over.  Thanks to the magic of Scout, I heard about this essentially instantly.  Well, my cell phone did, at any rate.  My cell phone was packed up with my laptop and other essential computer stuff for transport by hand.  I didn’t hear about the queue falling over until after the move was mostly complete, by which time it was already 8 PM for many of my customers (in the US).

I panicked.  Mistake #3.  I was worried about many customers not getting their reminders for appointments tomorrow, so instead of doing the smart thing and purging the outgoing queue, turning on my In Case Shit Happens button (which prevents any outgoing reminders without my explicit approval), and manually restarting then verifying that the system was stable, I decided to improvise.  Mistake #4.

I visually inspected the queue, which was 1,000+ jobs ranging from outgoing reminders to low-priority requests to external analytics APIs.  I saw one type of queued item that would be annoying to try again — demo calls, which have to occur when a user is still on the website rather than hours later — and purged them.  Then I just restarted the queue workers and watched the queue go from 1,000 jobs down to 0.  Mission accomplished, right?

That night, for some reason I couldn’t sleep, so I turn on my iPad and check email.  I had several very irate emails from customers, who had just had their morning appointments come in and complain about getting contacted by Appointment Reminder.  Repeatedly.  See, for the several hours that the queued workers were down, a cron job kept saying “Who has an appointment tomorrow?  Millie Smith?  Have we called Millie Smith yet?  OK then, queuing a call for Millie Smith and ignoring her for 5 minutes while that call takes place.”  There are an awful lot of 5 minute intervals in several hours, and the queue was not idempotent, so Millie Smith got many, many calls queued for her.

As soon as I hit “go”, the backed up queue workers blasted through 600 calls, 400 SMSes, and 200 emails, and my website and Twilio received an impromptu stress test.  We passed with flying colors.  Millie Smith’s phone, on the other hand, did not.  The worst affected user got 40 calls, back to back, essentially DDOSing their phone line for 15 minutes straight.

I didn’t have Internet at my new apartment yet, so I picked up my laptop and walked 45 minutes across town at 3 AM to my old apartment to perform damage control.  First I hit the In Case Shit Happens button like I should have hours ago — it stayed on for the next several days.  Then I started making phone calls.  This was, unquestionably, the low point of my entrepreneurial career: picture me in a freezing, pitch black apartment at 4 AM in the morning crying in between calls to apoplectic customers of customers.

Things looked much better in the morning.  Surprisingly to me, I only lost two customers in the debacle, and one of them resubscribed after seeing how I handled it.

High Touch Sales Processes Are Not My Cup of Tea

I’m fairly decent at marketing software on the Internet with low-touch sales: you click on my AdWords ad or SEO’d piece of content, the website convinces you to take a spin, you like the software, and a sale happens without ever speaking to me.  This is born of necessity: I simply couldn’t routinely talk to people when I had a day job.  Happily, there exist at least a few people who will buy AR on this model.

Also happily, for a different kind of happy, there is a channel for AR that I wasn’t aware existed: white label sellers.  Picture a technology consultant or web development shop which has a relationship with a few dozen small businesses in their area.  Many of them sell hours-for-dollars but they would really love to have recurring revenue sources.  Their clients have business models which involve appointments.  They would like to sell AR to their clients as if it were their own software — it lets them have all of the upside of SaaS businesses (recurring revenue, low support, etc) without actually having to write SaaS.  This also has obvious benefits for me: they have boots on the ground to sell AR to their clients, and I don’t.

I had had this in the back of my mind as an option, but it was on the backburner until somebody came to me with a dream client.  Suffice it to say they were just about ready to sign on the dotted line, and it would have involved enough Small Business ($80 / month) accounts to singlehandedly make AR a smashing success.  I immediately dropped what I was doing and built up the infrastructure to actually offer white label accounts and let the white label customers customize their off-brand AR sites.  (You can see one for a fictitious Ocean Waves Spa here.)  All hosting and software gets taken care of by me.

Then that sale fell through.  It was nobody’s fault, really, the contact’s client just happened to decide to exit the line of business which used appointments.  Oof.  This sort of thing happens quite a bit in sales.  One would think I would be used to it, since it isn’t unknown in consulting either, but it still snuck up on me.

Similarly, actually riding herd on white label accounts has been more difficult than I would have expected.  I have had a dozen leads to folks very interested in offering it and then they just dry out, largely because I am not aggressive enough on pushing the deals forward.  My typical customer support workflow is responding to all email and then thinking that I am done.  It is a new experience when a) people are not trying to tell me about problems and b) this means I have work to do.  For example, many folks need marketing support (brochures, questions answered, and whatnot) to make the sale to their clients, and since they have the relationship but know nothing about AR, I need to figure out a way to get them that support in a timely and proactive manner without interrupting everything else I need to do.

Another niggle I had not expected: some B2B customers are unqualified and it is to your advantage to figure that out early and stop pursuing them.  I had a long exchange of emails with a prospect who does professional development for a particular type of business.  Think salon, but crucially, at a much lower price point than salons operate at.  We were 15 emails and thousands of words into discussing possibilities when she indicated that the $9 a month plan would simultaneously a) too costly and b) too limited for most of her customers.

“Ah, I don’t believe that my business is the right fit for your needs.  Best of luck in your search for an alternative.”

A business is defined both by what it does and what it does not do.  I don’t want to spend time marketing the service to customers who think $9 is an appreciable amount of money.  (For that and related reasons, I’ll be killing the $9 account tier for new customers as soon as I get the pricing page redesigned.)

What’s Next?

Same old same old!  I’m continuing to develop AR in response to observed customer needs and requests.  The product is very stable these days — I was able to virtually ignore it during a client engagement with no harm done.  Although I don’t know if I would have agreed with it at the time, I’m glad to have taken my licks when I had five customers as opposed to when I have five thousand — that would have made for a very long night of apologies.

I started implementing Scalable Content Generation (see above plug section) for AR.  Currently, I’m at the “experiment by hand” stage.  The site does not have sufficient link equity to rank for much yet, and I’m not totally wowed with my first concept for the content, so I’m going to try something else towards the end of April.  I also have a project or two in the queue along the lines of A/Bingo: produce something of value to people who are not my customers, put in on the website, collect links, use to bootstrap rankings for commercially valuable keywords.

I’m still tentatively targeting 200 paying accounts by the end of this year.  It will take a bit of acceleration to happen, but after May (going back to the US for family and a bit of the consulting/conference circuit), I’ll have most of summer to concentrate on scaling the marketing plan. I am strongly considering various options for taking things to the next level if I can get things that far.  It will depend on a few factors, some business and some personal, but it looks highly likely that there is a viable micro-ISV in AR and quite likely that there is a bigger business there if I want to go after it.

Any questions?

Software For Underserved Markets

They’ve posted my talk at Business of Software 2010.  I highly recommend watching the video first, prior to reading the slides.

BOS2010 was one of the defining moments of my professional career (notes here). I strongly, strongly advise that you come to it in 2011 if you’re interested in taking your software business to the next level: it’s choc full of very, very smart people who are running real, profitable software businesses. (I will probably be speaking again this year, but for longer and with less joking.)

Some Perspective On The Japan Earthquake

[日本の方へ:読者が日本語版を翻訳してくださいました。ご参照してください。]

I run a small software business in central Japan.  Over the years, I’ve worked both in the local Japanese government (as a translator) and in Japanese industry (as a systems engineer), and have some minor knowledge of how things are done here.  English-language reporting on the matter has been so bad that my mother is worried for my safety, so in the interests of clearing the air I thought I would write up a bit of what I know.

A Quick Primer On Japanese Geography

Japan is an archipelago made up of many islands, of which there are four main ones: Honshu, Shikoku, Hokkaido, and Kyushu.  The one that almost everybody outside of the country will think of when they think “Japan” is Honshu: in addition to housing Tokyo, Nagoya, Osaka, Kyoto, and virtually every other city that foreigners have heard of, it has most of Japan’s population and economic base.  Honshu is the big island that looks like a banana on your globe, and was directly affected by the earthquake and tsunami…

… to an extent, anyway.  See, the thing that people don’t realize is that Honshu is massive. It is larger than Great Britain.  (A country which does not typically refer to itself as a “tiny island nation.”)  At about 800 miles long, it stretches from roughly Chicago to New Orleans.  Quite a lot of the reporting on Japan, including that which is scaring the heck out of my friends and family, is the equivalent of someone ringing up Mayor Daley during Katrina and saying “My God man, that’s terrible — how are you coping?”

The public perception of Japan, at home and abroad, is disproportionately influenced by Tokyo’s outsized contribution to Japanese political, economic, and social life.  It also gets more news coverage than warranted because one could poll every journalist in North America and not find one single soul who could put Miyagi or Gifu on a map.  So let’s get this out of the way: Tokyo, like virtually the whole island of Honshu, got a bit shaken and no major damage was done.  They have reported 1 fatality caused by the earthquake.  By comparison, on any given Friday, Tokyo will typically have more deaths caused by traffic accidents.  (Tokyo is also massive.)

Miyagi is the prefecture hardest hit by the tsunami, and Japanese TV is reporting that they expect fatalities in the prefecture to exceed 10,000.  Miyagi is 200 miles from Tokyo.  (Remember — Honshu is massive.)  That’s about the distance between New York and Washington DC.

Japanese Disaster Preparedness

Japan is exceptionally well-prepared to deal with natural disasters: it has spent more on the problem than any other nation, largely as a result of frequently experiencing them.  (Have you ever wondered why you use Japanese for “tsunamis” and “typhoons”?)  All levels of the government, from the Self Defense Forces to technical translators working at prefectural technology incubators in places you’ve never heard of, spend quite a bit of time writing and drilling on what to do in the event of a disaster.

For your reference, as approximately the lowest person on the org chart for Ogaki City (it’s in Gifu, which is fairly close to Nagoya, which is 200 miles from Tokyo, which is 200 miles from Miyagi, which was severely affected by the earthquake), my duties in the event of a disaster were:

  • Ascertain my personal safety.
  • Report to the next person on the phone tree for my office, which we drilled once a year.
  • Await mobalization in case response efforts required English or Spanish translation.

Ogaki has approximately 150,000 people.  The city’s disaster preparedness plan lists exactly how many come from English-speaking countries.  It is less than two dozen.  Why have a maintained list of English translators at the ready?  Because Japanese does not have a word for excessive preparation.

Another anecdote: I previously worked as a systems engineer for a large computer consultancy, primarily in making back office systems for Japanese universities.  One such system is called a portal: it lets students check on, e.g., their class schedule from their cell phones.

The first feature of the portal, printed in bold red ink and obsessively tested, was called Emergency Notification.  Basically, we were worried about you attempting to check your class schedule while there was a wall of water coming to inundate your campus, so we built in the capability to take over all pages and say, essentially, “Forget about class.  Get to shelter now.”

Many of our clients are in the general vicinity of Tokyo.  When Nagoya (again, same island but very far away) started shaking during the earthquake, here’s what happened:

  1. T-0 seconds: Oh dear, we’re shaking.
  2. T+5 seconds: Where was that earthquake?
  3. T+15 seconds: The government reports that we just had a magnitude 8.8 earthquake off the coast of East Japan.  Which clients of ours are implicated?
  4. T+30 seconds: Two or three engineers in the office start saying “I’m the senior engineer responsible for X, Y, and Z universities.”
  5. T+45 seconds: “I am unable to reach X University’s emergency contact on the phone.  Retrying.”  (Phones were inundated virtually instantly.)
  6. T+60 seconds: “I am unable to reach X University’s emergency contact on the phone.  I am declaring an emergency for X University.  I am now going to follow the X University Emergency Checklist.”
  7. T+90 seconds: “I have activated emergency systems for X University remotely.  Confirm activation of emergency systems.”
  8. T+95 seconds: (second most senior engineer) “I confirm activation of emergency systems for X University.”
  9. T+120 seconds: (manager of group)  “Confirming emergency system activations, sound off: X University.”  “Systems activated.”  “Confirmed systems activated.”  “Y University.”  “Systems activated.”  “Confirmed systems activated.” …

While this is happening, it’s somebody else’s job to confirm the safety of the colleagues of these engineers, at least a few of whom are out of the office at client sites.  Their checklist helpfully notes that confirmation of the safety of engineers should be done by visual inspection first, because they’ll be really effing busy for the next few minutes.

So that’s the view of the disaster from the perspective of a wee little office several hundred miles away, responsible for a system which, in the scheme of things, was of very, very minor importance.

Scenes like this started playing out up and down Japan within, literally, seconds of the quake.

When the mall I was in started shaking, I at first thought it was because it was a windy day (Japanese buildings are designed to shake because the alternative is to be designed to fail catastrophically in the event of an earthquake), until I looked out the window and saw the train station.  A train pulling out of the station had hit the emergency breaks and was stopped within 20 feet — again, just someone doing what he was trained for.  A few seconds after the train stopped, after reporting his status, he would have gotten on the loudspeakers and apologized for inconvenience caused by the earthquake.  (Seriously, it’s in the manual.)

Everything Pretty Much Worked

Let’s talk about trains for a second.  Four One of them were washed away by the tsunami. All Japanese trains survived the tsunami without incident. [Edited to add: Initial reports were incorrect.  Contact was initially lost with 5 trains, but all passengers and crew were rescued.  See here, in Japanese.]  All of the rest — including ones travelling in excess of 150 miles per hour — made immediate emergency stops and no one died.  There were no derailments.  There were no collisions.  There was no loss of control.  The story of Japanese railways during the earthquake and tsunami is the story of an unceasing drumbeat of everything going right.

This was largely the story up and down Honshu.  Planes stayed in the sky.  Buildings stayed standing.  Civil order continued uninterrupted.

On the train line between Ogaki and Nagoya, one passes dozens of factories, including notably a beer distillery which holds beer in pressure tanks painted to look like gigantic beer bottles.  Many of these factories have large amounts of extraordinarily dangerous chemicals maintained, at all times, in conditions which would resemble fuel-air bombs if they had a trigger attached to them.  None of them blew up.  There was a handful of very photogenic failures out east, which is an occupational hazard of dealing with large quantities of things that have a strongly adversarial response to materials like oxygen, water, and chemists.  We’re not going to stop doing that because modern civilization and it’s luxuries like cars, medicine, and food are dependent on industry.

The overwhelming response of Japanese engineering to the challenge posed by an earthquake larger than any in the last century was to function exactly as designed.  Millions of people are alive right now because the system worked and the system worked and the system worked.

That this happened was, I say with no hint of exaggeration, one of the triumphs of human civilization.  Every engineer in this country should be walking a little taller this week.  We can’t say that too loudly, because it would be inappropriate with folks still missing and many families in mourning, but it doesn’t make it any less true.

Let’s Talk Nukes

There is currently a lot of panicked reporting about the problems with two of Tokyo Electric’s nuclear power generation plants in Fukushima.  Although few people would admit this out loud, I think it would be fair to include these in the count of systems which functioned exactly as designed.  For more detail on this from someone who knows nuclear power generation, which rules out him being a reporter, see here.

  • The instant response — scramming the reactors — happened exactly as planned and, instantly, removed the Apocalyptic Nightmare Scenarios from the table.
  • There were some failures of important systems, mostly related to cooling the reactor cores to prevent a meltdown.  To be clear, a meltdown is not an Apocalyptic Nightmare Scenario: the entire plant is designed such that when everything else fails, the worst thing that happens is somebody gets a cleanup bill with a whole lot of zeroes in it.
  • Failure of the systems is contemplated in their design, which is why there are so many redundant ones.  You won’t even hear about most of the failures up and down the country because a) they weren’t nuclear related (a keyword which scares the heck out of some people) and b) redundant systems caught them.
  • The tremendous public unease over nuclear power shouldn’t be allowed to overpower the conclusion: nuclear energy, in all the years leading to the crisis and continuing during it, is absurdly safe.  Remember the talk about the trains and how they did exactly what they were supposed to do within seconds?  Several hundred people still drowned on the trains.  That is a tragedy, but every person connected with the design and operation of the railways should be justifiably proud that that was the worst thing that happened.  At present, in terms of radiation risk, the tsunami appears to be a wash: on the one hand there’s a near nuclear meltdown, on the other hand the tsunami disrupted something really dangerous: international flights.  (One does not ordinarily associate flying commercial airlines with elevated radiation risks.  Then again, one doesn’t normally associate eating bananas with it, either.  When you hear news reports of people exposed to radiation, keep in mind, at the moment we’re talking a level of severity somewhere between “ate a banana” and “carries a Delta Skymiles platinum membership card”.)

What You Can Do

Far and away the worst  thing that happened in the earthquake was that a lot of people drowned.  Your thoughts and prayers for them and their families are appreciated.  This is terrible, and we’ll learn ways to better avoid it in the future, but considering the magnitude of the disaster we got off relatively lightly.  (An earlier draft of this post said “lucky.”  I have since reworded because, honestly, screw luck.  Luck had absolutely nothing to do with it.  Decades of good engineering, planning, and following the bloody checklist are why this was a serious disaster and not a nation-ending catastrophe like it would have been in many, many other places.)

Japan’s economy just got a serious monkey wrench thrown into it, but it will be back up to speed fairly quickly.  (By comparison, it was probably more hurt by either the Leiman Shock or the decision to invent a safety crisis to help out the US auto industry.  By the way, wondering what you can do for Japan?  Take whatever you’re saying currently about “We’re all Japanese”, hold onto it for a few years, and copy it into a strongly worded letter to your local Congresscritter the next time nativism runs rampant.)

A few friends of mine have suggested coming to Japan to pitch in with the recovery efforts.  I appreciate your willingness to brave the radiological dangers of international travel on our behalf, but that plan has little upside to it: when you get here, you’re going to be a) illiterate b) unable to understand instructions and c) a productivity drag on people who are quite capable of dealing with this but will instead have to play Babysit The Foreigner.  If you’re feeling compassionate and want to do something for the sake of doing something, find a charity in your neighborhood.  Give it money.  Tell them you were motivated to by Japan’s current predicament.  You’ll be happy, Japan will recover quickly, and your local charity will appreciate your kindness.

On behalf of myself and the other folks in our community, thank you for your kindness and support.

[本投稿を日本語にすると思っておりますが、より早くできる方がいましたら、ご自由にどうぞ。翻訳を含めて二次的著作物を許可いたします。詳細はこちらまで

This post is released under a Creative Commons license.  I intend to translate it into Japanese over the next few days, but if you want to translate it or otherwise use it, please feel free.]

[Edit: Due to overwhelming volume and a poor signal-to-noise ratio, I am closing comments on this post, but I encourage you to blog about it if you feel strongly about something.]

Japanese Disaster Micro-Update

Apologies for not posting this earlier — I put notices on my business websites but forgot that a lot of folks know me solely through the blog:

  • I live in Gifu, which is quite far from the earthquake epicenter.  We got shaken up a bit, but no permanent damage was done.  We’re landlocked so, unless the mountains fall into the sea, tsunamis are not an issue for us.
  • The people I’m close to in Japan are all OK.
  • We really appreciate your expressions of concern and prayers.
  • If you are wondering “What can I do?”, every day is a good day for charity.  I recommend the Red Cross or your local favorite charity.  In particular, disaster relief charities will use money collected today to help the folks affected by the next major incident, and it is highly probable that they are less well-situated than Japan is — we’re probably as well-prepared as anybody could be.

Thanks as always.  We’ll pull through this, don’t worry.

Regards,

Patrick

My Biggest Frustration With Google AdWords

Last week, I had an opportunity to talk with Andy Brice, who sells software for wedding seating plans and the like.  He is an absolute genius with AdWords, and gave me some ideas on ways to improve my performance.  I immediately started to implement them, full with the excitement of a new project and wondering why I don’t spend more time optimizing AdWords.

Oh, right.

There were another 15 ads which I added last Friday-ish and are still Under Review.  Under Review is Google-speak for “We aren’t sure that this ad complies with our policies yet.”  While an ad is Under Review, it doesn’t show anywhere, and you aren’t learning anything by having it.

Dealing With Shades of Grey

Google has a variety of businesses which it does not want to or legally cannot do business with.  To prevent them from using AdWords, they exercise prior restraint on AdWords copy, not letting their ads run until a human at Google has approved them.

One of the businesses that Google doesn’t want advertising (in the US, at any rate) is gambling.  Bingo is a form of gambling.  Bingo Card Creator is not a form of gambling — it is a form of software which helps elementary schoolkids learn to read.  This makes it rather hard to write focused, relevant advertisements responsive to customer queries like [how do I make a US presidents bingo card] which sell Bingo Card Creator without using the word “bingo” anywhere.

Google is, to all appearances, just using a keyword-based blacklist.  I guess all the eats-Bayesian-classifiers-for-lunch PhDs work in search and Gmail spam filtering, where they’ve clearly got an aptitude for understanding that words can have multiple meanings.  OK, fine, but at least the remaining boffins can do a blacklist correctly?

Well, not so much.

  • Using Google’s Copy Ad feature to copy an ad, word for word, between ad groups will cause the new copy to go back into review purgatory.  This is despite that theoretically being a content-neutral action and a core task for advertisers, because many flavors of AdWords optimization rely on keywords being partitioned correctly into focused ad groups.
  • Changing so much as a character of the ad, including landing page URLs, will cause the ad to get flagged again.  This only affects good advertisers.  Bad advertisers can presumably figure out how to serve whatever content they want on http://example.com/approved .  Pulling a bait-and-switch is absolutely trivial, since you have full control over what your own server serves to users.  This rule only inconveniences compliant advertisers, who get thrown into review purgatory every time they e.g. try to add another tracking parameter to their landing pages, switch from http:// to https://, etc etc.  I get the feeling I’m supposed to create five copies of each ad, pointing to /lp1 … /lp5 with identical content, and then if I need to do any testing I should get crafty with redirects or what have you later.  That’s insane - it is extra work that is directly against the spirit of the rules and unlike actual compliance it works.

Scalable Communication Methods

According to Google:

We review all ads in the order they’re received, and we work to review all ads in our program as quickly as possible, usually within 1 to 2 business days or sooner.

If there were only 48 hours of lag time inserted every time I touched an AdWords ad, this would be annoying but tractable.  It would lengthen my time through the idea creation/validation loop (Lean Startup  fans know why that is a Very Bad Thing), but I could still get work done by batching all my edits together and then twiddling my thumbs for 48 hours.

Sadly, Google routinely falls short of their announced level of service.  And when I say “Falls short”, I mean “Ads can sit for weeks ‘Under Review’ and never be approved.”

This leads you to have to contact Google Customer Service to be able to get Google to give permission to give Google money.

Google Customer Service: Welcome to Kafka

The first rule of Google Customer Service is that Google does not have Customer Service.  They prefer what Chief Engineer Matt Cutts describes as “scalable communication methods”: there are like a bazillion of you, there are only a few tens of thousands of us, instead of actually speaking to a human being you should read a blog post or watch a video or talk to a machine.  It is a wonderful, scalable model… when things work.

Anything which introduces a mandatory customer service interaction with Google is a process designed for failure.  AdWords approvals requires a customer service interaction.  Catch-22, to mix literary metaphors.

The “scalable communication methods” like AdWords Help have this to say about contacting customer service with regards to ad approvals:

Our Support team won’t be able to help you expedite this process.

That is not actually a true statement (which, incidentally, describes much of AdWords Help).  Length of time from ad submission to approval is, in my experiences, unbounded (literally, weeks can go by without approval).  Length of time from complaining to Support to approval: a day or two.  The most helpful Google employee I’ve ever Internet-met (name withheld to protect him from whatever dire punishments await someone who attempts to help customers) told me that my workflow should literally be 1) Submit ad 2) Submit ticket to get ad looked at, if I persistently fell into Under Review.

Google apparently knows it, too, since they have special-cased out the CS interaction for dealing with Ad approvals:

After filling in everything, I hit Submit expecting to be taken to a page which had an “OK, now actually tell us what the problem” comment box was.  No need — it has been optimized away!  Google doesn’t even want that much interaction.  (The last time I went through this — sometime last year — I recall there being a freeform field, limited to 512 characters or so.  I always use it to explain that I am not a gambling operation and if they want confirmation they can read the AdWords case study about my business.)

Google’s computers then weighed my support request and found it wanting:

Dear Advertiser,

Thank you for your e-mail. We understand you’d like us to review your ad.
When you submit new ads or make changes to existing ads, they’re
automatically submitted for review.

We work to review all ads in our program as quickly as possible. You
should receive an email notification stating the approval status of your
ads pending review within the next 3 business days. You can view the
status of your ad any time in your account. The “Status” column in the
“Ads” tab displays information on the current state of an individual ad
variation.

For a list of Ad Approval Statuses, visit
http://adwords.google.com/support/aw/bin/answer.py?hl=en&answer=138186

We are working as quickly as possible to get everyone up and running and
should get to yours soon! If you have a different question, which doesn’t
refer to pending ad approval, please get back to us via the ‘Contact Us’
link in the Help Center at https://adwords.google.com/support/aw/?hl=en.
Be sure to choose the category that is most relevant to your question.

Sincerely,

The Google AdWords Team

Well, at least the templating engine correctly replaced $BRUSHOFF_LETTER, but in terms of customer communication:

  • You asked me to put in my name… you might want to think about using it.
  • As much as I appreciate your False! Enthusiasm! if the next line of your letter is going  to be Eff Off And Die then maybe you should take out the exclamation points and give them to a Ruby programmer.  (We can always use more.)
  • If the original timeline was 1-2 business days and the timeline three days later is “within 3 business days”, can we update them so that they quote it consistently?  Or maybe put something like “We get to 98.2% of approvals within 3 business days.”  (Or 2.89% of approvals within 3 business days, as the case may be.)

Google’s Isolation From Market / Customer Pressure

Google theoretically values my business — I pay them $10,000 a year and would love to pay more.  Indeed, they can find my email address and have a human contact it when they want to do ad sales.  (I got an offer recently to set up a call with one of their AdWords Strategists to discuss optimization of my account… which is great, but previous experience leads me to believe he would use the same reports I have access to, make decisions with little understanding of my business, and then leave it to me to actually schedule the new ads/keywords and run headlong into Pending Review purgatory.)  But they are not doing very well lately at convincing me they actually care.  And they’ll still make a bazillion dollars without that, so no harm done.

In normal markets, I would be strongly tempted to take my business to vibrant competitive offerings.  Sadly, Google is pretty much the only game in town for viable CPC advertising: even if Microsoft/Yahoo exorcized the abominations haunting their UIs, they would not have enough inventory to matter for me in my niche (I’ve tried before).

Which leaves me with only one option: trying out my own scalable communication methods, and hoping someone in the Googleplex reads this and takes action to unbork this process (ideally, for a large class of advertisers).  It is the Internet equivalent of putting a message in a bottle and then throwing it into the ocean, but that is still an improvement on the normal channels.

Hacking Customers' Technology Adoption Cycles

YCombinator just released their semi-annual application for companies to incubate.  One of the new questions this time around is “How will you get users?”  I think that is a great question to think about for everybody in business — perhaps the great question to think about.  Customer acquisition is one of the easiest places to screw up as a startup, particularly for technical founders (who, in their previous lives, have probably never had to do it for anything).

I’m not applying to YC this time around, but I always fill out the application to force myself to talk through my business strategy.  I had one thought which sounded worthwhile enough to share: customer acquisition can be hacked.  You can take the current conventional wisdom in the market of how to get customers to use solutions, identify it’s weak points, and aggressively target them.  That can, potentially, be as important (or more important) than the same applied to the actual product.

Enemy #1: The Technology Adoption Cycle

Let’s assume that you’re capable of successfully identifying a problem customers have and solving it.  Those are both highly non-trivial, but put them out of scope for the moment: people’s hair is on fire, you’re selling fire extinguishers, life should be grand.  Life is often not quite so grand, because you can produce a wonderful product which creates value and fail to sell it to folks.

Most startups are not creating an entirely new solution out of whole cloth.  Somewhere out there people are currently experiencing the problem you are solving, and they’re dealing with it somehow.  They might be ignoring it or gritting their teeth.  They might be using some inferior solution which they got from your competitors, you have competitors (you should have competitors — if you don’t, you probably aren’t doing something people care about).

Your competitors had to see people through a product adoption cycle:

  1. Identify people with the problem
  2. Teach them that the solution exists
  3. Successfully sell them on the solution
  4. Prevent them from leaving the solution for a competing solution

In actual practice, this adoption cycle is frequently long and arduous.  (If it were short and easy, there wouldn’t be any money in it.)

Your competitors, if they are established businesses, are probably very good at maneuvering customers through the technology adoption cycle as it exists in the market today.  For example, if grading students is a problem, your competitor might very well be successfully selling school districts on their gigantic consultingware grading solution which costs six figures an installation.  Since they can still make the rent and keep the lights on, you can infer that their business probably works.   Their marketing team is generating sufficient leads, their sales team converts some of them.

But you probably don’t want to do what they’re doing, because they’re better at being them than you are.

Hacking The Product Adoption Cycle

Startups are not the world’s most obvious choice of employment for people who enjoy coloring between the lines.  If you execute the competition’s playbook for acquiring customers, you are probably going to get crushed by them, because

  • they know more about the market than you do
  • they have a commanding head start
  • they have large amounts of resources to throw at the problem

On the other hand, it is entirely possible that:

  • they have stopped learning about the market
  • they have a commanding head start running in a suboptimal direction
  • they have large amounts of resources which, for reasons of switching costs and politics, can’t be reallocated to more efficient approaches

These statements aren’t just true about the product — sure, they might have a crufty old VB6 app and you have the new Node.js hotness.  They are equally true about the customer acquisition process.  You’re competing with their business, not with their product, so you could possibly either focus your innovation on customer acquisition or, more likely, use innovation on both customer acquisition and product in a mutually supportive manner.

Examples Of Hacking This

Freemium isn’t a business model so much as it is a customer acquisition tactic.  In markets dominated by expensive solutions with huge switching costs and uncertainty about success with technology changes, freemium can be very compelling: the self-serve model allows you to do less consultative sales (with the multi-month purchase cycles, large sales teams, and politicking that entails) and instead focus your efforts on getting leads and converting them.  This plays to a very different skill set versus traditional enterprise B2B sales, and it is a much more forgiving of small teams, since you’re deputizing your free users as internal sales champions and praying that they can do the consultative sales that your non-existent sales force isn’t doing.  This also lets you crack into markets where any model which requires consultative sales automatically is priced out of the market — essentially, anything where customer lifetime value is less than $75,000, give or take.

Monthly billing is another hack.  Customers are irrational and their processes are broken.  One artifact of those practices is that there is a stepwise increase in difficulty if prices increase by $1, as long as the price was already at whatever the company’s magic number is for maximum to be put on a corporate credit card or signed for on a non-manager’s authority.    Monthly billing defeats this step function because even if the total lifetime cost of the solution goes up the largest amount ever billed at once might well cross under that critical threshhold again.  This means that there is no longer a total no-man’s land between $1,000 and $75,000 in lifetime value. (Is this a hack?  Yes.  If you bill a Fortune 500 company product manager $80 a month, you are essentially conspiring with him against his accounting controls.  Not that there is anything wrong with that.  You can even explicitly sell that as a benefit to him, just like you sell SaaS as allowing him to avoid having to talk to IT to get the stuff he needs to do his job.)

Online marketing expertise hacks through the ridiculous inefficiencies of offline marketing.  Many startups can run rings around their traditional competitors in online marketing, for example due to savvier SEO that leverages their strengths in execution speed, technological savvy, and community ties.  For example, my wee little business competes directly with Scholastic Publishing, who has 10,000 employees and access to public capital markets.  They also couldn’t spell SEO if you handed them a set of alphabet flash cards, which is good news for me.  You would think that “Well, a business which doesn’t have online marketing expertise could just hire for it”, but after you get past the level of “let’s make a website — it should probably have title tags and some of those keywords, too”, everyone who tries this finds that it is murderously difficult to hire competent SEOs right now.  (If you disagree, I have some clients who would love to meet you.)  At the same time, I couldn’t possibly compete with the relationships which get their competing products on shelves at tens of thousands of retail locations… but then again, I don’t have to pay 50 cents of every dollar of sales to the retailer, either.

Taking A Hack From Tactic To Strategy

I think this isn’t exactly a new insight.  There are lots of folks who, when asked for their marketing plan, will say “Oh, we’re going to get lots of search traffic” (indeed, that is probably second only to “it will grow virally” in terms of signaling “has probably not thought this through.”)  What separates hopes and dreams of future success from very valuable businesses is a strategy which, with execution and refinement over time, will actually achieve the goals.

We often hear products described using something like “It’s like Facebook, except for dogs.”  How about, instead, describing businesses like “It’s like Quicken, except Quicken sells primarily through boxed software channels and we’re going to sell primarily through banks which will deal with us for a cut of the sale price and the ability to deepen relations with small business customers, who consume lots of high margin services and stay locked in for decades at at time.”  (That may or may not actually be true.)

We often accept previous experience or minimal proof-of-concept prototypes/MVPs in lieu of a functioning product when evaluating whether someone is capable of executing on building something.  Why not do essentially the same for proving that one is likely to get customers?  A previous background in revenue maximization through negotiating cross selling deals for banks, or evidence that you have enthusiasm from a few bankers who like the concept and want to hear more when you have something to show, demonstrates a certain likelihood that marketing challenges will be overcome like technical challenges will be overcome.

Similarly, for a startup hoping to make inroads for SEO, I’d be thinking less along the lines of “we’ll sprinkle some SEO on our website” and more along the lines of specific plans for scalable content generation, securing backlinks at scale, and winning the support of influencers either in the niche or in other addressable niches which your competition may not be aware are relevant to that facet of their business.

Product Supports The Marketing And Vice Versa

I have a wee little heresy as an engineer: I think that you can make a perfectly viable business out of a product which is not better than competitors, solely by improving on the method of selling it.  Farmville (and whatever Zynga has reskinned it as this week), for example, is not superior to all other options for entertainment… it just beats the pants off of most of their viral spread patterns, because promoting your use of the game is the core gameplay mechanic.  (You can also do this in more socially beneficial ways than Farmville, don’t worry.  I have a competitor in the bingo business whose product is very close in quality to my product.  They sell to schools via a catalog.  I sell to teachers via a website.  Despite solving the same problem for the same end-users our businesses are like ships passing in the night.  Hilariously, at least a few of my customers actually own both pieces of software, because the people who buy from the catalogs never bothered telling the people who use the websites.)

However, this doesn’t mean you can’t innovate on both the marketing and the product.  Indeed, since they feed off each other, that is probably substantially more effective than innovating on one or the other.  Imagine what a juggernaut World of Warcraft would be if they nailed their game’s quality as much as they did and also had Zynga’s viral loop and monetization model.  That hypothetical WoW could probably deal with Chinese net regulators by buying China.

(It’s easy to say this in retrospect: empirically, millions of employed adults with lots of disposable income spend much of their free time playing WoW.  They spend huge amounts of money on buying status for themselves — cars, diamonds, big houses.  They clearly value their experience in the game.  Therefore, they should be willing to buy status in the game, too… and since buying status is more being seen as having paid lots of money than it is about any particular artifact received, this should go over very well.  I mean, crikey, in a world where encrusted mollusk discharges say “I love you” anything is possible… anyhow, it is easy to say that in hindsight.  The challenge for startups is identifying that sort of synergy between customer adoption and the product in advance, and communicating that it is likely enough to happen to risk betting on.)

Hacking A Non-Computer System Whose Source Is Closed And Updated Continuously

We all know the first iteration of the product is going to suck (hopefully in the sense of “not meet customers needs” more than “a broken, unreliable mess”).  The first iteration of the marketing strategy is also going to suck (hopefully in the sense of “fail to generate the expected level of success” rather than “like shouting to an audience of deaf ants during a hurricane”).  Just like you can use the Lean Startup principles to modify your product and marketing message such that it comes closer to achieving a match with what some people actually need, you can also use spiritually similar disciplines to iterate on customer acquisition strategies.  There is as large a solution space in them as there is in the product space.  Maybe you need to try SEO and see that it doesn’t do a great job in your market, for your customers, while an affiliate channel performs better.  If you’re experimenting, measuring, and moving with a purpose as opposed to the traditional method of “throw stuff at the wall and see what sticks”, you will hopefully have a bit of success.

I’d love to hear if you have comments.

[Memo to self: Prior to ever actually applying for YC, I should practice thinking big thoughts and then writing small thoughts.  Those form fields are tiny!]

Loading...
Free video + email advice on making & selling software:
(1~2 emails a week.)