Archive by Author

Security Lessons Learned From The Diaspora Launch

Last week, Diaspora — the OSS privacy-respecting social network — released a “pre-alpha developer preview” of their source code.  I took a look out it, mostly out of curiosity, and was struck by numerous severe security errors.  I then spent the next day digging through their code locally and trying to get in touch with the team to address them, privately.  In the course of this, I mentioned obliquely that the errors existed on Hacker News, and subsequently was interviewed by The Register and got quoted in a couple of hundred places.

The money quote most outlets went for was:

The bottom line is currently there is nothing that you cannot do to someone’s Diaspora account, absolutely nothing.

I’d like to back up that contention, now that it is safe(r) to do so.

Reporting security bugs is a funny business: any description of the error sufficient to resolve it is probably sufficient to create exploit code.  This is why I was fairly circumspect about the exact mechanism for the errors, and why Steve Klabnik also mostly declined to give specifics when describing the state of the codebase.  Since the specific errors I reported are now patched, I’m going to disclose what they were, so that budding Rails developers who care about security do not inadvertently give attackers the ability to do anything they want.

By the way, if you’re looking for Rails security advice, I recommend the official guide and the OWASP list of web application vulnerabilities, which would have helped catch all of these.  Web application security is a very deep topic, and often involves unforeseen circumstances caused by the interaction of complicated parts which are not totally under the developers’ control.  That said, nobody should be making errors like these.  It hurts us as developers, it hurts our ecosystem, and it endangers our users in spite of the trust they have put in us.

I found somewhere in the ballpark of a half-dozen critical errors — it depends on how you count pervasive mistakes that undermined virtually every class in the system.  There were three main genres.  All code samples are pulled from Diaspora’s source at launch, were reported to the Diaspora team immediately, and have been reported to me as fixed.

Authentication != Authorization: The User Cannot Be Trusted

Code:

#In photos_controller.rb
def destroy
    @album = Album.find_by_id params[:id]  # BUG
    @album.destroy
    flash[:notice] = "Album #{@album.name} deleted."
    respond_with :location => albums_url
end

This basic pattern was repeated several times in Diaspora’s code base: security-sensitive actions on the server used the params hash to identify pieces of data they were to operate on, without checking that the logged in user was actually authorized to view or operate on that data. For example, if you were logged in to a Diaspora seed and knew the ID of any photo on the server, changing the URL of any destroy action from the ID of a photo you own to an ID of any other photo would let you delete that second photo. Rails makes exploits like this child’s play, since URLs to actions are trivially easy to guess and object IDs “leak” all over the place. Do not assume than an object ID is private.

(There is a second error here, by the way: the code doesn’t check to see if the destroy action is called by an HTTP POST or not. This means that an overenthusiastic browser might follow all links from a page, including the GET link to a delete action, and nuke the photo without any user action telling it to do so.)

You might think “Surely Diaspora checks to see if you’re logged in, right?” You’re right: they use Devise, a library which handles authentication, to verify that you can only get to the destroy action if you’re logged in. However, Devise does not handle authorization — checking to see that you are, in fact, permitted to do the action you are trying to do.

Impact:

When Diaspora shipped, an attacker with a free account on any Diaspora node had, essentially, full access to any feature of the software vis-a-vis someone else’s account. That’s pretty bad, but it gets even better when you combine it with other errors.

How to avoid this:

Check authorization prior to sensitive actions. The easiest way to do this (aside from using a library to handle it for you) is to take your notion of a logged in user and only access members through that. For example, in my software, any action past a login screen has access to a @user variable. If an action needs to access one of their print_jobs, it calls @user.print_jobs.find(params[:id]). If they have subverted the params hash, that will find no print_job (because of how associations scope to the user_id) and they’ll instantly generate an ActiveRecord exception, stopping any potential nastiness before it starts.

Mass Assignment Will Ruin Your Day

Code:

#users_controller.rb
def update
  @user = User.find_by_id params[:id]  # <-- uh oh, no auth check
  prep_image_url(params[:user])

  @user.update_profile params[:user]  #  root_url)
end

#user.rb
def update_profile(params)
  if self.person.update_attributes(params)  #  <-- insert input directly to DB
  ...
  end
end

Alright, so we know that if we forget authorization then we can do arbitrary bad things to people. In this case, since the user update method is insecured, we can meddle with their profiles. But is that all we can do?

Unseasoned developers might assume that an update method can only update things on the web form prior to it. For example, this form is fairly benign, so maybe all someone can do with this bug is deface my profile name and email address:

Diaspora Profile Update Page

This is catastrophically wrong.

Rails by default uses something called “mass update”, where update_attributes and similar messages accept a hash as input and sequentially call all accessors for symbols in the hash. Objects will update both database columns (or their MongoDB analogues) and also call parameter_name= for any :parameter_name in the hash that has that method defined.

Let’s take a look at the Person object to see what mischief this lets us do. (Right, instead of updating the profile, update_profile updates the Person: Diaspora’s internal notion of the data associated with one human being, as opposed to the login associated with one email address (the User). Calling something update_profile when it is really update_person is a good way to hide the security implications of code like this from a reviewer. Names matter — make sure they’re accurate.) What methods and fields do you expose…

#Person.rb
class Person
  ...
  key :url,            String
  key :diaspora_handle, String, :unique => true
  key :serialized_key, String   #This is your public/private encryption key pair.  OOPS.

  key :owner_id,  ObjectId   #You don't want me changing this one either...

  one :profile, :class_name => 'Profile'
  many :albums, :class_name => 'Album', :foreign_key => :person_id
  belongs_to :owner, :class_name => 'User'   #... because it lets me own you!  Literally!

end

#User.rb
one :person, :class_name => 'Person', :foreign_key => :owner_id  #Oh dear.

This is painful: by changing a Person’s owner_id, I can reassign the Person from one account (User) to another, allowing me to both deny arbitrary victims from their use of the service and also take over their account, allowing me to impersonate them, access their data at will, etc etc. This works because one in MongoDB picks the first matching entry in the DB it can find, meaning that if two Person have the same owner_id, your account will non-deterministically control one of them. So I’ll assign your Person#owner_id to be my #owner_id, which gives me a fifty-fifty shot at owning your account. If that is annoying for me, I can always assign my Person#owner_id to have some nonsense string, de-linking them and making sure current_user.person finds your data when I’m logged in.

But wait, there is more!: Note the serialized_key column. Can you guess what that is for? Well, if you follow some spaghetti in the User class, that is their serialized public/private encryption key pair. You might have heard that Diaspora seeds use encryption when talking between each other so that the prying eyes of Mark Zuckerberg can’t read your status updates. Well, bad news bears: the attacker can silently overwrite your key pair, replacing it with one he generated. Since he now knows your private key, regardless of how well-implemented your cryptography is, he can read your messages at will.

This is what kills most encryption systems in real life. You don’t have to beat encryption to beat the system, you just have to beat the weakest link in the chain around it. That almost certainly isn’t the encryption algorithm — it is some inadequacy in the larger system added by a developer who barely understands crypto but who trusts that sprinkling it in magically makes it better. Crypto is not soy sauce for security.

Is this a hard attack? No. You can do it with no tool more complicated than Firefox with Firebug installed: add an extra parameter to the form, switch the submit URL, own any account you like. It took me two minutes to find this vulnerability (I looked at the users controller first, figuring it was a likely place for bad stuff to happen if there was bad stuff to be found), and started trying to get the word to the Diaspora team immediately. It literally took longer to get Diaspora running than it took to create a script weaponizing this.

Steps to avoid: First, fix the authentication. That won’t prevent this attack, though — I can still screw up my Person by changing it’s owner_id to be yours (and do this an arbitrary number of times), virtually guaranteeing that I can successfully disassociate your account from your person.

After you fix authentication, you need to start locking down write access to sensitive data. Start by disabling mass assignment, which should be off in an public-facing Rails app. The Rails team keeps it in because it saves lines of code and makes the 15 minute blog demo nicer, but it is an easy security hole virtually anywhere it exists. Consider it guilty until proven innocent.

Second, if your data store allows it, you should explicitly make as much as feasible unwritable. ActiveRecord lets you do this with attr_readonly — I’m not sure whether you can do it with MongoMapper or not. There is almost certainly no legitimate reason for owner_id to be reassignable.

NoSQL Doesn’t Mean No SQL Injection

Code:

def self.search(query)
  Person.all('$where' => "function() { return this.diaspora_handle.match(/^#{query}/i) ||
               this.profile.first_name.match(/^#{query}/i) ||
               this.profile.last_name.match(/^#{query}/i); }")
end

Diaspora uses MongoDB, one of the new sexy NoSQL database options. I use a few myself. They have a few decades less experience getting exploited than the old relational databases you know and love, so let’s start: I claim this above code snippet gives me full read access to the database, including to serialized encryption keys.

What the heck?!

Well, observe that due to the magic of string interpolation I can cause the string including the Javascript to evaluate to virtually anything I want. For example, I could inject a carefully constructed Javascript string to cause the first regular expression to terminate without any results, then execute arbitrary code, then comment out the rest of the Javascript.

We can get one bit of data about any particular person out of this function: whether they are in the result set or not. However, since we can construct the result set at will, we can make that a very significant bit indeed. One thing Javascript can do is take a string and convert it to a number. I’ll elide the code for this because it is boring, but it is fairly straightforward. After I have that Javascript, I can run a binary search to get someone’s serialized_key. “Return Patrick if his serialized key is more than 2^512. OK, he isn’t in the result set? Alright, return Patrick if is key is more than 2^256. He is in the result set? Return him if his key is more than 2^256 + 2^255. …”

If their key has 1,024 bits (wow, so secure), it will take me roughly 1,024 and change accesses to find it. That will take me, hmm, a minute? Two? I can now read your messages at will.

I think MongoDB will let me do all sorts of nastiness here aside from just reading parts of the person object: for example, I strongly suspect that I can execute state-changing Javascript (though I didn’t have any luck making a Lil Bobby Tables to drop the database, but I only spent about two minutes on it) or join the Person document with other documents to read out anything I want from the database, such as User password hashes. That might be a fun project for someone who is not a complete amateur.

Code injection: fun stuff for attackers, not quite so fun.

How to avoid this:

Don’t interpolate strings in queries sent to your database! Use the MongoDB equivalent of prepared statements. If MongoDB doesn’t have prepared statements, don’t use it for your security-critical projects until it does, because you will be exploited.

Take Care With Releasing Software To End Users

Since making my public comments, I have heard — over and over again — that none of the above matters because Diaspora is in secret squirrel double-plus alpha unrelease and early adopters know not to put any data in it. False. As a highly anticipated project, Diaspora was guaranteed to (and did) have publicly accessible nodes available within literally hours of the code being available.

People who set up nodes might be intelligent enough to evaluate the security consequences of running them. That is actually false, because there are public nodes available, but we’ll run with it. Even if the node operators understand what they are doing, their users and their users’ friends who are invited to join The New Secure Facebook are not capable of evaluating their security on Diaspora. They trust that, since it is on their browser and endorsed by a friend, it must be safe and secure. (This is essentially the same process by which they joined facebook — the zuckers.)

How would I have handled the Diaspora release? Well, candidly, I wouldn’t have released the code in the current state, and instead would have devoted non-trivial effort to securing it prior to release. If you put a gun to my head and said “Our donations came from 6,000 people who want to see progress, give me something to show them”, I would have released the code that they had with the registration pages elided, forcing people to only add new users via Rake tasks or the console. That preserves 100% of the ability of developers to work on the project, and for news outlets to take screenshots, without allowing technically unsophisticated people to successfully sign up to the Diaspora seed sites.

I don’t know if the Diaspora community understands how bad their current security posture is right now. Looking at the public list of public Diaspora seeds, while the team has put a bold disclaimer that the software is insecure (which no one will read because no one reads on the Internet — welcome to software, guys), many of the nodes are explicitly appealing as safer options which won’t reset their DB, so you won’t lose your work if you start on them today. That is irresponsible.

Is Diaspora Secure After The Patches?

No. The team is manifestly out of their depth with regards to web application security, and it is almost certainly impossible for them to gather the required expertise and still hit their timetable for public release in a month. You might believe in the powers of OSS to gather experts (or at least folks who have shipped a Rails app, like myself) to Diaspora’s banner and ferret out all the issues. You might also believe in magic code-fixing fairies. Personally, I’d be praying for the fairies because if Diaspora is dependent on the OSS community their users are screwed. There are, almost certainly, exploits as severe as the above ones left in the app, and there almost certainly will be zero-day attacks by hackers who would like to make the headline news. “Facebook Competitor Diaspora Launches; All Users Data Compromised Immediately” makes for a smashing headline in the New York Times, wouldn’t you say?

Include here the disclaimer that I like OSS, think the Diaspora team is really cool, and don’t mean to crush their spirits when I say that their code is unprofessional and not ready to be exposed to dedicated attackers any time soon.

Automatically Minimizing Javascript and CSS with Rails

If you obsess over your YSlow and/or Page Speed performance scores, you might be wondering how to get your Rails app to automatically crunch Javascripts and CSS files without actually requiring cumbersome rake tasks to be repeated during every deploy.  Fear not: you can quickly monkeypatch Rails to do this.

  1. Copy this gist into your lib/ directory.
  2. Create a config/initializers/javascript_minimization.rb as described here.
  3. Use your javascript_include_tag and stylesheet_include_tag helpers as normal, with the caching option turned on.
  4. Watch your users save 100kb.

Organized, Curated Lists of Best Posts From This Blog

When I started this blog four years ago, I never thought it would be seen by more than a few dozen people, so I never planned any sort of information architecture to it.  500 posts (300,000 words!) and several hundred thousand readers later, it is unwieldy to get to the good stuff unless you have been following along for the last couple of years, or have a week free.  To make this a little less annoying, with the help of an assistant from Hacker News I found the best 72 articles I’ve written, grouped them by category, and for ones which are logically related explained what the connections are (in particular, for the “experiment” / “results” pairs).

I hope you find it useful.  If I let out anything or if there is a category you know I publish on but did not include in the list, please let me know in the comments.

New Trends In Startup Financing Explained For Laymen

Noted American technology investor and all-around smart guy Paul Graham wrote recently about emerging trends in startup funding, specifically that convertible notes and rolling closes are displacing the traditional equity rounds done at a fixed valuation done with angel syndicates.

Did that sound like Greek to you?

Great, you might benefit from this translation of Financier into Geek.  (P.S. If you haven’t figured out the significance of it originally being written in Financier instead of in Geek, please, think it through.)  I originally wrote it as a comment on Hacker News but somebody asked me to put it somewhere easily findable.  I have elaborated with standard blog post formating and graphs where I thought they helped the explanation:

Why We Care About Angel Investing

Startups raise money from investors to accelerate their growth into, hopefully, massively profitable businesses and/or massively large acquisitions from big companies.

One particular type of investor that invests in startups is called an angel investor. An angel investor is often an individual human being who is wealthy, frequently as a consequence of successful entrepreneurship. They invest anywhere from $25,000 to $250,000 or so.

Fundraising is painful, and requires a lot of time and focus from startup founders. To mitigate the pain, it is often structured in terms of “rounds”, where the startup goes out to raise a particular large sum of money all at once. For an angel round, let’s say that could be a million dollars. (n.b. It is trending down, as companies can now be founded for sums of money which would have been laughable a few years ago.)  Clearly we’re going to need to piece together contributions from a few angels here.

Why Angel Investing Frustrates Founders

Traditionally, one angel has been the “lead” angel, who handles the bulk of the organizational issues for the investors. The rest just sit by their phone and write checks when required. (Slight exaggeration.) Investors are often skittish, and they require social proof to invest in companies, so you often hear them say something like a) they’re not willing to invest in you but b) they are willing to invest in you if everybody else does. This leads to deadlocks as a group of investors, who all would invest in the company if they company were able to raise investment, fail to invest in the company because it cannot raise investment.

Startup founders are, understandably, frustrated by this.

What “Valuation” Means

All numbers below this point were chosen for ease of illustration only.  They do not represent typical valuations, round sizes, or percentages of companies purchased by angels.

One item of particular interest in investing is the valuation of the company. This gets into heady math, but the core idea is simple: if we agree that the company is worth $100 at this instant in time (the “pre-money valuation”), and you want to invest $100, then right after the company receives your investment, the company is worth $200 (the “post-money valuation”). Since you paid $100, you should own half the company.

Traditionally, the company has exactly one pre-money valuation (which is decided solely by negotiation, and bears little if any relation to what disinterested outside observers could perceive about the company). All investors receive slices in the company awarded in direct proportion to the amount of money they invest. Two investors investing the same amount of money receive the same sized slice of the company. This can be written as “they invested at the same valuation.”

The thesis of PG’s essay is that allowing investors to invest at the same valuation is not advantageous to the startup. Instead, by offering a discount to valuation for moving quickly, you can convince investors to commit to the deal early, thus starting the stampede from the hesitant investors who were waiting to see social proof.

For example, take the company from earlier. We said it was worth $100 prior to receiving investing, but that is not tied to objective reality. Say instead we’ll agree that it is worth $80… but only with respect to the 1st investor. He commits $20. $80 + $20 = $100, so he gets $20 / $100 = 20% of the company for $20, or $1 = 1%. This convinces a second investor to invest. He says “Can I get 20% for $20, too?” Not so fast, buddy, where were you yesterday? The company isn’t worth $80 any more. We think it is worth $105 now. (Did we just get through saying $100? Yes. But valuations are not connected to objective reality.) So you get $20 / ($105 + $20) = 16% of the company for your $20. Think that is fair? You do? OK, done.

This continues a few times. The startup raises money — possibly more money, depending on how much the angels want in — with less hassle for the founders.

What Is A Convertible Note?  Why Do Founders Like Them?

We’ve been talking about just dollars so far, and alluding to control of the company as if it were equity like stocks, but there is a mechanism called “convertible notes” at play here. A convertible note is the result of a torrid affair between a loan and an equity instrument. It looks a bit like Mom and a bit like Dad. Like a loan, it charges interest: typically something fairly modest like 6 to 8%, much less than a credit card.

The tricky thing about convertible notes is that they convert into partial ownership of the company at a defined event, most typically at the next round of VC funding or at the sale of the company. So, instead of the first investor getting $20 = 20% of the company, he loans the company $20 in exchange for a promise like this: “You owe me $20, with interest. Don’t worry about paying me back right now. Instead, next time you raise money or sell the company, we’re going to pretend that I’m either investing with the other guy or selling with you. The portion of the company which I buy or sell will be based on complicated magic to protect both your interests and my interests. If you want to sweeten the deal for me, sweeten the magic.”

Do we understand why this arrangement works for both parties? It incentivizes investors to commit early, which lets startups raise more money with less pain. Because startups are in the driver’s seat, it also lets them avoid collusion among investors (“We decided we’d all invest in you, but we don’t think the company is worth $100. We think it is worth $50. Yeah, that has no basis in objective reality, but objective reality is that your company is worth $0 without the $100 in our collective pockets. What is it going to be? Give up 2/3 of the company, or go broke and get nothing.”)

How Do You Calculate The Equity Value of A Convertible Note?

OK, back to complicated magic. When the company takes outside investment, the convertible notes magically convert into stock, based on:

  • a) the valuation the company receives for the investment round  (higher numbers are better for both founders and angels)
  • b) a negotiated discount to the valuation, to reward the angel investor for his early faith in the company (higher numbers are better for angels)
  • c) possibly, a valuation cap (higher numbers, or no cap,  are better for founders)

For example, continuing with our “low numbers make math comprehensible” startup, let’s say it goes on a few months and is then raising a series A round, which basically means “the first time we got money from VCs”. We’ll say the VC and startup negotiate and agree that the company is worth $500 today, the VC is investing $250, ergo the VC gets a third of the company.

How much does our first $20 angel investor get? Well, he gets to participate like he was investing $20 today, plus he gets a discount to the valuation. So instead of getting $20 / $750 = 2.67% of the company, maybe he got a 20% discount to the valuation, so he gets $20 / (.8 * $750) = 3.33% of the company. (We’re ignoring the effect of interest here for simplicity, but he probably effectively has $21 and change invested by now in real life.)

After this is over, the convertible note is gone, and our angel investors are left with just shares (partial ownership of the company), which they probably hold until the company either goes IPO or gets bought by someone. So if the company later gets bought for $2,000 by Google, our intrepid angel investor makes $66 on his $20 investment.

How Does A Valuation Cap Work?

We haven’t discussed valuation caps yet. Valuation caps are intended to prevent the startup dragging its feet on raising money, thus building up lots of worth in the company, and then the angel investor getting cheesed. For example, if they had just grown through revenues for a year or two, they might be raising money at a valuation of $1,250. In that case, $20 only buys you 2% of the company (remember, he gets a 20% discount : $20 / (.8 * $1250) = 2%), which the angel investor might think doesn’t adequately compensate him for the risk he took on betting on a small, unproven thing several years before. So we make him a deal: he gets to invest his $20 at the same terms as the VCs do if, and only if, the valuation is less than $750. If it is more than $750, for him and only him, we pretend it was $750 instead. This means that under no circumstances will he walk away with less than $20 / (.8 * $750) = 3.33% of the company, as long as the company goes on to raise further investment. (Obviously, if they fold, he walks away with nothing. Well, technically speaking, with debt owed to him by a company which is bankrupt and likely has no assets to speak of, so essentially nothing.)

Perhaps This Will Be Clearer With A Picture

Angels ultimately benefit from higher discounts to the valuation of the Series A round, and lower valuation caps.  Higher discounts, and higher effective discounts, mean you get more of the company for less money.  That is an unambiguous good, as long as you keep the quality of the company constant.

Let’s see how valuation caps affect how much of the company you end up with.  The better the company is doing by Series A time, the less of the company the angel ends up with.  This shows the incentive for the founders: do as well as you can prior to raising money, which is the same incentive founders always have.

As you can see from the below graph, a valuation cap essentially gives the angel an artificially higher discount for if the Series A valuation exceeds the valuation cap.  Obviously then, it is in the interest of angels to negotiate as low a cap as possible, and in the interests of founders to negotiate a high cap or no cap at all. According to Paul Graham, this becomes the primary “pricing” mechanism in the new seed financing economy: if a founder wants to reward an angel, they award them with a lower cap.  If they don’t, the angels get a higher cap, or no cap at all.  This kicks discussions of valuations down the road a little bit, and allows you to simultaneously offer the company to multiple angels at multiple “price points”.  That allows you to reward them for non-monetary compensation (mentoring, having a big name, etc) or for early action on the deal.

This Is Not My Business. Take With A Grain Of Salt.

Lest anyone get the wrong impression, my familiarity with angel investing is very limited and, to the extent that it exists, it is mostly about angel investing in small town Japan.  (Oh, the stories I can’t tell.)  The above explanation is based on me processing what I’ve read and trying to prove that I understand it by explaining it to other people.  If I have made material errors, please correct me in the comments.

My current business is not seeking funding (and would be an extraordinarily poor candidate for it).  I’ll never say never for the future, but for the present, I rather like getting 100% of the returns.

[Edit: Want to use some or all of this, including the graphs, for your own purposes?  Go ahead.]

The Hardest Adjustment To Self Employment

I apologize in advance for spelling mistakes, because I am writing this on my iPad on the bullet train to Tokyo. Wonderful device, not so great for writing lengthy blog posts like my usual.

I am on the way to Tokyo because a high school friend is there this week. As soon as I heard, I told him to pick a day and I would be there. What day? Literally any day. My schedule is infinitely flexible.

That is what scares me the most about this job. Like most people, I have lived an entire lifetime conforming to schedules. They exist like the Greek gods: you didn’t ask for them but they are there, there is no negotiating with them, and prolonged association means you are likely to get your dignity violated by a bovine.

But schedules are structure, and structure helps. Be at school at nine AM. Seminar starts at 10:30, do not be late. Work starts at nine, Patrick, waltzing in at ten annoys people even if you are contractually permitted to do it and even if you will still be here at 2 AM. (Regardless of start time. If you thought the gods were rational, you have been reading the wrong mythology.)

At the very least, schedules put you and everyone else on the same page as to what you should be doing. Gainfully employed young men should be working at 2 PM on a Wednesday. I was having a late lunch and reading a novel at the coffee shop. The waitress asked me, confused, whether I was a student or not. Students have social license to do bugger all for a few years prior to working for a living. I told her I run a software company, and one of the perks is that I get to have lunch whenever. She was impressed, but asked how my customers and employees stand that.

That’s the the thing about schedules: once you have one everyone else needs to, too, preferably as close to yours as possible. My customers do not share my schedule: most use my software when I am asleep, and mail me with their urgent issues at 3 AM in the morning Japan time. I spend lots of effort decoupling their happiness from my personal availability. This does wonderful things for site uptime but it also means, perhaps regrettably, that there is generally no compulsion to work today.

My freelancers largely don’t share my schedule either. I just got the front page for Appointment Reminder redone after several months with placeholder graphics. The extraordinarily talented designer I worked with (Melvin Ram at Volcanic Web Design, a web design company) “met” with me for a total of perhaps twenty minutes, and we worked (him on having graphical inspiration, me on wrangling it into a functioning product) in temporal isolation for the rest of the project. This is the arrangement I almost always use for freelancing. It is wonderfully productive except in that it lets me off the hook for causing forward progress.

That is the other part about scheduling: with the debatable exception of my consulting work, I am terrible about setting and keeping multi week schedules for milestones. This never came up when I was employed, since I had managers to crack the whip and avoided doing anything multi week for my business, but it is now biting me with a vengeance. I wanted to have AR in beta six weeks ago. Between consulting, vacation, and BCC, I haven’t made almost any forward progress on engineering.

I know that to be true for AR because code isn’t getting written, but I always think it to be true for BCC. It turns out that I am smoking something: I ran a shell script to compare my productivity (commits, A/B tests, etc) prior and post quitting. I thought it would show me spinning my wheels. Turns out I am getting more done than ever. This is normally the point where I would paste a graph but, sorry, iPad. Suffice it to say I have run more A/B tests this summer than in the last year. (Interesting finding of today: Google Checkout really does increase conversion rates over having only Paypal as an option. I strongly suspected that, but now I know.)

Sales are up, too. Why doesn’t it feel this way? I think after a couple of decades of living by the clock I have become habituated to measuring my productivity that way. Insane and irrational, I know.

I am looking for ways to hack this: the discipline and social validation of having a schedule, without actually having to work at nine. I have been considering getting an office just to have mental separation between work and non work. (They give them away in this town if you work in tech. I could pay the rent with the savings in my iced cocoa budget.) Plus, if I have an office, I have an offsetting factor the next time I am accused of being unemployed. Sounds funny until it happens from a police officer who does not quite understand immigration statuses. (You know that controversial immigration policy in Arizona? Don’t ask me my opinions about it around sharp implements. Suffice it to say I can vividly imagine what getting stopped under it will feel like.)

Another way is, and you might laugh, a little iPad app called EpicWin which gives me fake RPG loot for making progress. Will it work? No clue, but one week in, I seem to be getting more of my “boring” chores accomplished. (I had considered building this into my business for a while, but resisted because I thought it would cause neglect of my nonbusiness priorities. It turns out that, if anything, I have the opposite problem now that I have infinite scheduling flexibility.)

And I just earned 100xp for this blog post.

Dealing With Market Seasonality

One of the attractions of having a website is that it operates twenty four hours a day, 365 days a year.  However, depending on what your market is, it might not operate evenly on all of those days.  The phenomenon of market seasonality is well-understood in offline businesses — ice cream shops do most business in the summer, retailers have a 4th quarterly sales spike — but understanding of it seems to be fairly limited among software companies.  Since market seasonality has non-trivial impact on many software businesses, I thought I would write about what I’ve learned.

Brief Background

I sell software which makes bingo cards to elementary schoolteachers, who overwhelmingly purchase the software with their own money directly in advance of the day they plan to use it.  This gives me a very quirky calendar relative to many software businesses.  Collapsing my sales chart for the last several years over the calendar year produces…

Average Sales Per Month

As you can see, I have a trough of sales every summer and a spike in October, among other seasonal anomalies.  More about why that is later.  This is just the sales graph, but almost any other statistic for my business — page views, signups, ad impressions, search traffic, etc etc — more or less tracks this curve, although that isn’t going to be the case for all businesses.

Know Your Market

I came into this situation more or less blind: prior to starting my business, if I had thought about it, I would have expected “Yeah, sales are probably going to pick up when the school year starts”, but the huge spike in October and the mini-trough in November/December were all surprises to me.  If possible, don’t be surprised.  Presumably you are expert or near-expert at your particular problem domain, and should be accustomed to what your business calendar probably looks like in your market.  If you aren’t, time to make a courtesy call to either customers or competitors and ask them.  (Sure, you can ask your competitors this.  Why not?  If you’re worried about being laughed off the phone, then find someone who is publicly traded and look at their quarterly earning reports.  The numbers and explanatory notes will often give you all the information you need.)

For highly seasonal businesses, get used to the words year-over-year.  As in, sales in July for me were up 42% year-over-year, meaning that they were up 42% as compared to last July, as opposed to comparing them with this June.  This gives you a much more accurate picture of where the business is going, because for businesses which are not growing at truly meteoric rates, seasonal effects can very easily swamp the underlying trend.  (If you’re a viral Silicon Valley darling, you may be able to discount this.  However, if you’re a retailer, it is virtually impossible to have a January better than your December.  In fact, if it happens, it means you probably had a catastrophe.)

Cash Flow Management For The Off-Season

When people stop buying what you sell, you’ll tend to have less money.  That sounds vacuous, but it has important implications for your cash management.  It means that you’ll need to model your behavior on the ant, not the grasshopper: when times are easy, save up for when times are going to be difficult.

There are other options besides just banking money made during the on-season.  For many businesses, support loads, marketing activities, and the like all roughly coincide with the level of sales.  The number of emails I send regarding Bingo Card Creator falls by roughly 80% during the summer months.  This frees up a lot of time and focus in your organization.  One option to keep the lights on for software businesses is to do consulting or similar billable work during the off-season and go back to the product during the on-season — I took advantage of that this summer, to avoid breaking the piggybank too badly.

Another option, very popular among offline businesses but not used much online, is to do seasonal sales.  For example, if you sell software to teachers, you could do a back to school sale in August and, essentially, cut three weeks off of summer.  I did a 20% discount one year from late July through mid-August, which helped cushion the blow a bit.

Many SaaS startups are currently scratching their heads wondering “Why does this matter?”, because they get consistent revenue on a monthly basis regardless of when the customer’s signup was, and hence can’t conceive of having a growing user population and falling revenues.  This is yet another enviable reason to sell software on a subscription basis.

Other Ways To Exploit The Off Season

If you find that your marketing, customer support, and what have you decline in the off season, and you don’t need to find someone to bill to keep the lights on, you have a variety of fun options:

  • Deploying new features, particularly ones which involve significant engineering changes (like wholescale site redesigns or business model changes), gets much easier when 80% of the user population is not actually on the site.  In the event you have an issue, the number of people affected by it is likely to be much smaller, and you’ll have a few months to get the kinks out prior to sales hitting their peaks again.  When I released the online version of my software last year, conversion rates plunged by about 30% while I was getting debugging the software and (more importantly, as it turns out) the marketing message.  Since it was the dog days of summer, though, that didn’t end up costing me all that much money (although it was a sock in the gut to have my first year-over-year decline ever).
  • For similar reasons, there is never a time like the present to start on a new project.  It might be worth doing one which either has a different seasonal cycle, or none at all, as this eliminates the cash flow challenge.  (On the other hand, having a built-in 3 month vacation isn’t the worst problem in the world.)

Communicating Seasonal Effects

To the extent that outside parties do not understand the seasonality in your market, you need to educate them.  Investors, for example, might assume if they come from a non-seasonal background that any dip in the “up and to the right” story means you are losing traction.  It is a virtual certainty, for example, that GroupOn is going to sell less coupons in January than they will in December.  Having an article on TechCrunch saying “Groupon Growth Finally Puttering Out?” would not be positive for them (well, to the extent that TechCrunch coverage matters to a profitable business…).

Here are two very different graphs:

And here’s the same data, presented in another light:

The first business looks like it has had mostly flat sales for the last year, and a recent slump.  The second business looks like it is killing it.  They are the same business.  In the rather unlikely scenario that I was seeking investors, I know which of these two graphs I’d put in my slide deck.

Other presentational techniques to use:

  • Second derivative of (pick a metric: sales, signups, etc).  The story is “We’re growing faster all the time.”  (Note as you can tell from eyeballing the above graph this is not actually true for me.)
  • Pick a metric where the seasonal effects are less apparent.  Conversion rates are wonderful for this, since often you’ll have an increased conversion during the off-season.  (If someone comes into an ice cream shop in December, that is because they really, seriously are in the mood to buy ice cream.  They’ll also find the place absolutely deserted.)

Plan In Advance For Peak Season, Too

The top of the year for educational bingo is in October, due to a massive influx of teachers wanting to play Halloween bingo at their class party.  (A brief aside: Halloween is considered a holiday of special importance to young children in America, but they aren’t given the day off, which means teachers want to do a fun activity in class to commemorate it.  It has also been mostly secularized, which means public schoolteachers are at no risk to their career if they throw a Halloween party, whereas throwing an Easter party is a wee bit tricky.  You have to do it without mentioning Jesus and, simultaneously, without mentioning that you’re not mentioning Jesus.  Most teachers just have a chocobunny and hope for break to arrive.)

When I was young and stupid, I started planning for Halloween in October.  That works very poorly for Internet marketing: it will take months for your holiday-specific content (such as my page about Halloween bingo cards) to start ranking in the search engines where you want it to be.  If you do AdWords advertising, starting a campaign right before a holiday is likely going to be disastrous, because a) if it gets plunged into approval purgatory you just missed your holiday (this has happened to me quite frequently) and b) your campaign won’t have time to build up the positive history which will get it the widest exposure and best costs.

For SEO projects, start several months in advance of holiday or season you’re targeting.  For AdWords, I’d recommend at least a few weeks.  These schedules can be attenuated if you have a national-level brand to fall back on or a built-in tribe of evangelists like 37Signals has assiduously cultivated, but in general, more time is better.  (If this sounds like a whole lot of work, that is only because it is a whole lot of work, but you can have stupendously disproportionate returns to seasonal promotions.  I set records every year at Halloween even without ranking #1 for the term I was most interested in, and now that I am ranking #1 I am cautiously optimistic that I’ll absolutely smash my records this year.)

If you have another facet of this you’d like covered, or if you have some tactics which have worked for you, I’d love to hear about them in the comments.

Does Your Product Logo Actually Matter?

Some months ago when the 99designs fixed-price logo store launched, my buddy Thomas at Matasano remarked that it would have been the perfect choice for my business.  I said that, while I’m quite impressed with 99designs’ business model and many of the logos on offer, I thought the less-generic-looking logo which I had custom-made fairly cheaply (by the folks at Logo Samurai — I bought three logos and chucked the two I liked least) was better than a generic logo.  Thomas disagreed, and said that the more “professional” look of the off-the-rack logos would offset the genericness.

If Thomas and I were bigwigs at a multinational firm with entire departments devoted to preventing anything from getting accomplished, that disagreement might have been settled by a great deal of harumphing.  However, we’re engineers, so we made a wager: I would purchase a logo handpicked by Thomas and A/B test it against my present logo sitewide, and whomever’s logo lost would donate $100 to a charity named by the victor.

The Original Logo:

The 99 Designs Logo:

Technical Details

This A/B test was done sitewide — on first access of the site, via landing page, organic search, or any other source, users were randomly assigned one of the two logos to be the “official” Bingo Card Creator logo.  They saw the same logo for all subsequent accesses.  This was done with A/Bingo, the Rails A/B testing framework I wrote.  This particular test required A/Bingo to be slightly extended to ignore bots, because — since the test was sitewide and visible prior to a bot-blocking login screen — the deluge of bots this site deals with was drowning out the signal with their non-converting noise.  Happily, since A/Bingo is open source, all other users of the software benefit from Thomas and I fighting over geek cred.

We agreed to measure conversions to the free trial rather than conversions to purchase.  Measuring conversions to purchase would have been technically feasible, but since much of my market is out of school during the summer, getting enough data to be statistically significant would probably have taken months.  Conversions to the trial are much more frequent so it was reasonable to expect sufficient data in a week or two.  The test was allowed to run for two weeks.

The Results

My original logo ended up winning by a virtual mile: 11.05% of all users seeing the classic logo converted, versus 9.92% of those seeing the 99designs logo.  Just by eyeballing that you might think “Hmm, 1% difference, that’s probably just noise”, but when you factor in the sample sizes (8,500 real people saw each variation) it turns out that the difference is statistically significant at the 99% confidence level.

(If those conversion numbers strike you as fairly low, remember that the test is running sitewide rather than on a single landing page.  Many folks who visit the site are drawn to free activities created for SEO purposes, and downloading those activities doesn’t necessarily require signing up.  Conversion rates for my AdWords landing pages are closer to 25 ~ 30%.  Relatedly, consolidated conversion rate is a poor metric to make decisions on, but that is a discussion for another day.)

Even though in this particular case the A/B test did not succeed in identifying a profitable change to make in my business, I consider it a wonderful use of my time — it only required $99, a few minutes of time from my local designer (to put the logo on the background I use for my header without it getting aliased), and a single line of code.  I can’t possibly stress enough how valuable it is to have your A/B tests require only a single line of code.  You’ll test more frequently, you’ll test more diligently, and you’ll discover more valuable things about your business.

Implications For Other Businesses

Graphic designers: Some of you appear to be highly opposed to spec work, at least partially because you believe it to produce inferior work.  You might consider measuring the effect of well-designed logos (for any value of “well-designed”, for example, “uses a business model I approve of”) versus other logos.  I have the graphical skill of a molerat — I should be linking to this article on your blog, professional graphic designer, not writing it.

99designs: Don’t get me wrong, I love you guys.  My Appointment Reminder site uses an off-the-rack logo from your site that I’m very, very fond of.  I’m very, very open to the notion that there exist circumstances under which this test would have come down the other way.

Software companies: Your logo could potentially add or subtract 10% from enterprise value.  Bolded for emphasis because holy cow.  (10% higher conversion rate to a trial flows virtually directly to my bottom line.)  The majority of companies in our industry get a logo done early in their corporate life, generally exactly once.  For some companies, brand equity might be of paramount concern, and the risk of creating confusion in a two-week A/B test swamps even a 10% potential increase in conversion rates.  For those of us who are more transaction-oriented than brand-oriented, though, this is an easy, obvious, fairly inexpensive area to test.

(Obligatory “I passed Stats 101″ disclaimer: the significance test I performed says that logo #1 is 99% likely to truly be better than logo #2, rather than saying that logo #1 is 99% likely to be 10% better than logo #2.)

SEO for Software Companies

This is a rough outline of my verbal remarks while giving a presentation at the Software Industry Conference.  Regular readers of my blog will notice it has a lot of overlap with previous posts on the topic, but I thought posting it would save presentation watchers from having to take copious notes on URLs and expand the reach of the presentation to people who couldn’t attend this year.

Brief Biography

My name is Patrick McKenzie.  For the last six years I was working in Japan, primarily as a software engineer.  In the interim, I started a small software company in my spare time, at about five hours a week, which recently allowed me to quit my day job.  Roughly half my sales and three quarters of my profits come as a result of organic SEO, and the majority of the remainder come from AdWords.  If you need to know about AdWords, talk to Dave Collins, who is also attending. This presentation is about fairly advanced tactics — if you need beginner-friendly SEO advice, I recommend reading SEOMoz’s blog or taking a look at SEOBook.

Bingo Card Creator makes bingo cards for elementary schoolteachers.  This lets them, for example, teach a unit on chemistry and then, as a fun review game, call out the names of compounds like “ozone” and have students search out the chemical symbols on their bingo cards.  Students enjoy the game because it is more fun than drilling.  Teachers enjoy the game because it scales to any number of students and slides easily into the schedule.  However, making the cards by hand is a bit of a pain, so they go searching on the Internet for cards which already exist and find my website.  I try to sell them a program which automates card creation.

SEO can be a powerful tool for finding more prospects for your business and increasing sales.

SEO In A Nutshell

People treat SEO like it is black magic, but at the core it is very simple: Content + Links = You Win.

Content: Fundamentally, users searching are looking for keywords, and Google wants to send searchers to content which is responsive to the intent of the searcher.  Overwhelmingly, this means to content directly responsive to the keywords.  This is particularly true on the long tail, meaning queries which are not near the top of the query frequency distribution.  Many more people search for “credit cards” than for “How do I make a blueberry pie?”

For the most popular queries, the page that ranks will likely not be laser-targeted on “credit cards”.  However, for the long tail, a page that is laser-targeted will tend to win if it exists.  The reason is that Google thinks that your wording carries subtle clues of your intent, so it should generally be respected.  Someone looking for “How to make a blueberry pie?” isn’t necessarily as sophisticated a cook as someone who searches for “blueberry pie recipe” — they might not even be looking with the intention of making a blueberry pie, but rather out of curiosity as to how it is made, and so a recipe does not directly answer their intent.

Links: With billions of pages on the Internet, there needs to be a way to sift the wheat from the chaff and determine who wins out of multiple close pages.  The strongest signal for this is how trusted a site and how trusted a page is, and this is overwhelmingly measured by links.  A link from a trusted page to another page says “I trust this other page”, and the aggregate graph shows you which pages are most trusted on the Internet.  Note that trust is used as a proxy for quality because it is almost impossible to measure quality directly.

It is important to mention that links to one bit of content on your site help all other content — perhaps not as much as the linked content, but still substantially.  Wikipedia’s article on dolphins doesn’t necessarily have thousands of links pointing to it, but over their millions of articles like the History of the Ottoman empire, they have accumulated trust sufficient that a new page on wiki is assumed to be much better than a new page on a hobbyist’s blog.  Note that because Wiki ranks for nearly everything they tend to accumulate new citations when people are looking for someone to cite.  This causes a virtuous cycle (for Wiki, anyway): winners win.  You’ll see this over and over in SEO.

Despite this equation looking additive, SEO very rarely shows linear benefits.  Benefits compound multiplicatively or exponentially.  Sadly, many companies try to develop their SEO in a linear fashion: writing all content by hand, searching out links one at a time, etc.  We’ll present a few techniques to do it more efficiently.

The Biggest Single Problem

The biggest single problem with software company’s SEO is that they treat their website like a second class citizen.  The product gets total focus from a team of talented engineers for a year, and then the website is whipped up at 2 AM in the morning on release day and never touched again.  You have to treat your website like it was a shipping software product of your company.

It needs:

  • testing
  • design
  • strategic thought into feature set
  • continuous improvement
  • for loops

“For loops?”  Yes, for loops.  You’d never do calculations by counting on your fingers when you have a computer available to do them for you.  Hand-writing all content in Notepad is essentially the same.  Content should be separated from presentation — via templates & etc — so that you can reuse both the content and presentational elements.  Code reuse: not just for software.

Scalable Content Generation

Does anyone have a thought about how large a website’s optimal size should be?  10 pages?  A hundred pages?  No, in the current environment, the best size for a website is “as large as it possibly can be”, because of how this helps you exploit the long tail.  As long as you have a well-designed site architecture and sufficient trust, every marginal topic you cover on your website generates marginal traffic.  And if you can outsource or automate this such that the marginal cost of creating a piece of content is less than the marginal revenue received from it, it makes sense to blow your website up.

This is especially powerful if you can make creation of content purely a “Pay money and it happens” endeavor, which lets you treat SEO like a channel like PPC: pour in money, watch sales, laugh all the way to the bank.  The difference is that you get to keep your SEO gains forever rather than having to rebuy them on every click like PPC.  This is extraordinarily powerful if you do it right.  Here’s how:

Use a CMS

The first thing you need to enable scalable content generation is a CMS.  People need to be able to create additional content for the website without hand-editing it.  WordPress is an excellent first choice, but you can get very, very creative with custom CMSes for content types which are specific to your business if you have web development expertise.

Note that “content” isn’t necessarily just blog posts.  It is anything your customers perceive value out of, anything which solves problems for them.  That could be digitizing your documentation, or answering common questions in your niche (“How do I…” is a very common query pattern), or taking large complex data sets and explaining their elements individually in a comprehensible fashion.  Also note that it isn’t strictly text: you can do images and even video in a scalable fashion these days.

For example, using Flickr Creative Commons search, you can tap millions of talented photographers for free to get photos, so illustrating thousands of pages is as simple as searching, copying, and crediting the photographer.  You can use GraphicsMagick or ImageMagick to create or annotate images algorithmically.  You can use graphing libraries to create beautiful graphs from boring CSV files — more on that later.

The reasons why you’d use a CMS are they make content easy to create and edit, so you’ll do more of it.  Additionally, by eliminating the dependency on the webmaster, you can have non-technical employees or freelancers create content for you.  This is key to achieving scale.  You can also automate on-page SEO optimization — proper title tags, interlinking, etc — so that content creators don’t have to worry about this themselves.

Outsource Writing

You are expensive.  English majors are cheap.  Especially in the current down economy, stay at home moms, young graduates, the recently unemployed, and many other very talented folks are willing to write for cheap, particularly from home.  This lets you push the marginal cost of creating a new page to $10 ~ $15 or lower.  As long as you can monetize that page at a multiple of that, you’ll do very well for yourself.  Demand Media is absolutely killing it with this model.

Finding and managing writers is difficult.  If you use freelancers and find good ones, hold onto them for dear life, since training and management are your main costs.  Standardize instructions to freelancers and find folks who you can rely on to exercise independent thinking.

You can also get content created as a service, using TextBroker.  Think of the content on your website as a pyramid: you have a few pages handwritten by domain experts with quality off the chart, and then a base of the pyramid which is acceptable but perhaps not awe-inspiring.  At the 4 star quality level, you can get content in virtually infinite quantity at 2.2 cents per word.  You can either have someone copy/paste this into your website or do a bit of work with their API and automate the whole process.

You can use software to increase the quality of outsourced content.  For example, putting a picture on it automatically makes it better.  You can automate that process so your editors can quickly do it for all pages.  You can remix common page elements — calls to action, etc — which are polished to a mirror shine with the outsourced content.  You can also mix content from multiple sources to multiply its effectiveness: if you have 3 user segments and 3 features they really value, that might be 9 pages.  (If you use 2 features per page, that is 18.  As you can see, the math is gets very compelling very quickly.)

Milk It

Now that you’re set up to do content at scale, you can focus on doing it well.  The best content is:

Modular: You can use it in multiple places on the website.  You paid good money for it.  If you can use it in two places, the cost just declined by half.

Evergreen: The best possible value for an expiration date is “never”.  Chasing the news means your content gets stale and stops providing value for the business.  Answering the common, recurring, eternal problems and aspirations of your market segment means content written this year will never go out of style.  That lets you treat content as durable capital.  Also, because it tends to pick up links over time, it will get increasing traffic over time.

The first piece of content I made for my website took me two hours to write.  It made $100 the first month.  Not bad, but why only get paid once?  It has gone on to make me thousands over the years, and it will never go out of style.

Competitively Defensible: One of the tough things about blog posts is that any idiot can get a blog up as easily as you can.  Ideally, you want to focus on content which other people can’t conveniently duplicate.  OKCupid’s blog posts about dating data are a superb example of this: they use data that only they have, and they’ve made themselves synonymous with the category.  No wonder they’re in the top 3 for “online dating”.  Proprietary data, technical processes which are hard to duplicate, and other similar barriers establish a moat around your SEO advantage.

Process-oriented: If something works, you want to be able to exploit it in a repeatable fashion.  Novelty is an excellent motivational factor and you can’t lose it, but novelty that can be repeated is a wonderful thing to have.  You also want to have a defined step where you see what worked and what didn’t, so that you can improve your efforts as you go on.

Tracking:

Track what works!  Do more of that!  Install Google Analytics or similar to see what keywords people are reaching for on your site.  Keywords (or AdWords data) are great sources of future improvements.  Track conversions based on landing page, and create more content based on the content which is really winning.  If content should be winning but isn’t, figure out why for later iterations — maybe it needs more external or internal promotion, a different slant, a different target market, etc.

Case Study

Getting into the heads of my teachers for a moment — a key step — most teachers have a lesson planned out and need an activity to slot into it.  For example, they know they have a lesson about the American Revolution coming up.  Some of them, who already like bingo, are going to look for American Revolution bingo cards.  If my site ranked for that, that would be an opportunity to tell them that they could use software to create not just American Revolution activities but bingo for any lesson if they just bought my software.

So I made a CMS which, given a list of words and some explanatory text, would create a downloadable set of 8 bingo cards (great for parents, less great for teachers) on that topic, make a page to pitch that download in, and put an ad for Bingo Card Creator on the page.  Note how I’m using this content to upsell the user into more of a relationship with me: signing up for a trial, giving me their email address, maybe eventually buying the software.

I have a teacher in New Mexico who produces the words and descriptions for me.  The pages end up looking like this for the American Revolution.  She produces 30 activities a month for $100, and I approve them and they go live instantly.  This has been going on for a few years.  In the last year, I’ve started doing end-to-end conversion tracking, so I can attribute sales directly to the initial activity people started with.

This really works.  Some of the activities, like Summer bingo cards or Baby Shower Bingo cards, have resulted in thousands of dollars in sales in the last year.  $3.50 in investment, thousands in returns.  And there is a long tail of results:

This graph shows the 132 of the 900 activities which generated a sale in the last year.  You can see that there is a long tail which each generated one sale — in fact, a hundred of them.  Sure, you might not think that Famous Volcano bingo cards would be that popular, but I’ll pay $3.50 to get a $30 sale as often as the opportunity is offered.  These will also continue producing value in the next years, as they already have over the last several years: note that roughly half of these which produced a sale in the last 12 months were written in 2007 or 2008.

This took only a week or two to code up, and now 5 minutes a month sending my check and a thank-you note to the freelancer.  I’ve paid her about $3,000 over the last few years to write content.  In the last year alone, it has generated well over $20,000 in sales.  If you do things this efficiently, SEO becomes a channel like PPC — put in a quarter, get out a dollar, redeploy the profits to increase growth.

Any software company can create content like this, with a bit of strategic thinking, some engineering deployed, and outsourced content creation.  Try it — you can do an experiment for just a few hundred dollars.  If it works, invest more.  (Aaron Wall says that one of the big problems is that people do not exploit things that work.  If you’ve got it, flaunt it — until it stops working.)

Linkbait

Linkbait is creating content intended to solicit links to your website.  This can be by exploiting the psychology of users — they show things to friends because they agree with them strongly, or they hate them.  They create links because it creates value for them, not value for you — it increases their social status, it flatters their view of the world, it solves their problems.

All people are not equal on the Internet: twenty-something technologists in San Fransisco create hundreds of times more links per year than retired teachers in Florida.  All else being equal, it makes sense to create more of your linkbait targeted at heavy linking groups.  They’ve been labeled the Linkerati by SEOMoz, and I recommend the entire series of posts on them highly.

Software developers have some unique, effective ways to create linkbait.  For example:

Open Source Software

OSS developers and users are generally in very link-rich demographics.  OSS which solves problems for businesses tends to pick up links from, e.g., consultants deploying it — they will cite your website to justify their billing rate.  That is a huge win for you.  There are also numerous blogs which cover practically everything which happens in OSS.

OSS is fairly difficult to duplicate as linkbait, because software development is hard.  (Don’t worry about people copying it — you’ll be the canonical source, and the canonical source for OSS tends to win the citation link.  Make sure that is on your site rather than on Github, etc.)

OSS in new fields in software — for example, Rails development the last few years — has landgrab economics.  The first semi-decent OSS in a particular category tends to win a lot of the links and mindshare.  So get cracking!  And keep your eyes open for new opportunities, particularly for bits of infrastructural code which you were going to write for your business needs anyhow.

Case Study: A/Bingo

I’m extraordinarily interested in A/B testing, and wanted to do more of it on my site.  At the time, there was no good A/B testing option for Rails developers.  So I wrote one.  It went on to become one of the major options for A/B testing in Rails, and was covered on the official Rails blog, Ajaxian, and many other fairly authoritative places on the Internet.  It is probably the most effective source of links per unit effort I’ve ever had.

Some tactical notes:

  • Put it on your website.  You did the work, get the credit for it.
  • Invest in a logo — you can get them done very cheaply at 99designs.  Pretty things are trusted more.
  • Spend time “selling” the OSS software.  Documentation, presentation of benefits, etc.
  • OSS doesn’t have to be a huge project like Apache.  You can do projects in 1 day or 1 week which people will happily use.  (Remember, pick things which solve problems.)

Conclusion

I’m always willing to speak to people about this.  Feel free to email me (patrick@ this domain).

Speaking at Software Industry Conference

I’m currently in Dallas at the Software Industry Conference, where I’ll be giving a presentation about SEO strategies on Saturday.  In the meanwhile, if you’re at the conference or feel like coming out to the Hyatt Regency, feel free to get in touch with me.

As you have probably guessed, I’ll be posting the presentation and some textual elaboration on it right after I finish delivering it.  (I don’t know if video will be available this time.)

Running Apache On A Memory-Constrained VPS

Yesterday about a hundred thousand people visited this blog due to my post on names, and the server it was on died several fiery deaths. This has been a persistent issue for me in dealing with Apache (the site dies nearly every time I get Reddited — with only about 10,000 visitors each time, which shouldn’t be a big number on the Internet), but no amount of enabling WordPress cache plugins, tweaking my Apache settings, upgrading the VPS’ RAM, or Googling lead me to a solution.

However, necessity is the mother of invention, and I finally figured out what was up yesterday. The culprit: KeepAlive.

Setting up and tearing down HTTP connections is expensive for both servers and clients, so Apache keeps connections open for a configurable amount of time after it has finished a request.  This is an extraordinarily sensible default, since the vast majority of HTTP requests will be followed by another HTTP request — fetch dynamically generated HTML, then start fetching linked static assets like stylesheets and images, etc.  Look, of 43 requests, 42 were not the last request in a 3 second interval.  It is a huge throughput win.  However, if you’re running a memory constrained VPS and get hit by a huge wave of traffic, KeepAlive will kill you.

When I started getting hit by the wave yesterday, I had 512MB of RAM and a cap (ServerLimit = MaxClients) of 20 worker processes to deal with them.  Each worker was capable of processing a request in a fifth of a second, because everything was cached.  This implies that my throughput should have been close to 20 * 60 * 5 = 60k satisfied clients a minute, enough to withstand even a mighty slashdotting.  (That is a bit of an overestimation, since there were also static assets being requested with each hit, but to fix an earlier Reddit attack I had manually hacked the heck out of my WordPress theme to load static assets from Bingo Card Creator’s Nginx, because there seems to be no power on Earth or under it that can take Nginx down.)

However, I had KeepAlive on, set to three seconds.  This meant that for every 250ms of a worker streaming cached content to a client, it spent 3 seconds sucking its thumb waiting for that client to come back and ask for something else.  In the meantime, other clients were stacking up like planes over O’Hare.  The first twenty clients get in and, from the perspective of every other client, the site totally dies for three seconds.  Then the next twenty clients get served, and the site continues to be dead for everybody else.  Cycle, rinse, repeat.  The worst part was people were joining the queue faster than their clients were either getting handled or timeouted, so it was essentially a denial of service attack caused by the default settings.  The throughput of the server went from about 60k requests per second to about 380 requests per second.  380 is, well, not quite enough.

Thus the solution: turning KeepAlive off.  This caused CPU usage to spike quite a bit, but since the caching plugin was working, it immediately alleviated all of the user-visible problems.  Bingo, done.

Since I tried about a dozen things prior to hitting on this, I thought I’d quick write them down in case you are an unlucky sod Googling for Apache settings for your VPS, possibly Ubuntu Apache settings, or that sort of thing:

  • Increase VPS RAM: Not really worth doing unless you’re on 256MB.  Apache should be able to handle the load with 20 processes.
  • Am I using pre-fork Apache or the worker MPM? If  you’re on Ubuntu, you’re probably using the pre-fork Apache.  MPM settings will be totally ignored.  You can check this by running apache2 -l .  (This is chosen at compile time and can’t be altered via the config files, so if — like me — you just apt-get your way around getting common programs installed, you’re likely stuck.)
  • What should my pre-fork settings be then?

Assuming 512 MB of RAM and you are only running Apache and MySQL on the box:

<IfModule mpm_prefork_module>
StartServers          2
MinSpareServers       2
MaxSpareServers      5
ServerLimit          20
MaxClients           20
MaxRequestsPerChild  10000
</IfModule>
You can bump ServerLimit and MaxClients to 48 or so if you have 1GB of RAM.  Note that this assumes you’re using a fairly typical WordPress installation, and you’ve tried to optimize Apache’s memory usage.  If you see your VPS swapping, move those numbers down (and restart Apache) until you see it stop swapping.  Apache being inaccessible is bad, swapping might slow your server down bad enough to kill even your SSH connection, and then you’ll have to reboot and pray you can get in fast enough to tweak settings before it happens again.
  • How do I tweak Apache’s memory usage? Turn off modules you don’t need.  Go to /etc/apache2/mods-enabled.  Take note of how many things there are that you’re not using.  Run sudo a2dismod (name of module) for them, then restart Apache.  This literally halved my per-process memory consumption last night, which let me run twice as many processes.  (That still won’t help you if KeepAlive is on, but it could majorly increase responsiveness if you’ve eliminated that bottleneck.)  Good choices for disabling are, probably, everything that starts with dav, everything that starts with auth (unless you’re securing wp-admin at the server layer — in that case, enable only the module you need for that), and userdir.
  • What cache to use? WordPress Super Cache.  Installs quickly (follow the directions to the letter, especially regarding permissions), works great.  Don’t try to survive a Slashdotting without it.
  • Any other tips?  Serve static files through Nginx.  Find a Rails developer to explain it to you if you haven’t done it before — it is easier than you’d think and will take major load off your server (Apache only serves like 3 requests of the 43 required to load a typical page on my site — and two of those are due to a plugin that I can’t be bothered to patch).
  • My server is slammed and I can’t get into the WordPress admin to enable the caching plugin I just installed:  Make sure Apache’s KeepAlive is off.   Change your permit directive in the Apache configuration to

<Directory /var/www/blog-directory-getting-slammed-goes-here>

Options FollowSymLinks

AllowOverride All

Order deny,allow

Deny from all

Allow from <your IP address goes here>

</Directory>

This will have Apache just deny requests from clients other than yourself (although Apache will keep the connection open if you’re using KeepAlive, which won’t due you a lick of good since it will still hold the line open so that it can deny their next request promptly — don’t use KeepAlive).  That should let you get into the WordPress admin to enable and test caching.  After doing so, you can switch to Allow from All and then test to see if your site is now surviving.

Sidenote: If you can possibly help it, I recommend Nginx over Apache.  I use Apache because a couple of years ago it was not simple to use Nginx with PHP.  This is no longer the case.  The default settings (or whatever  you’ve copied from the My First Rails Nginx Configuration you just Googled) are much more forgiving than Apache’s defaults.  It is extraordinarily difficult to kill Nginx unless you set out to do so.  Apache.conf, on the other hand, is a whole mess of black magic with subtle interactions that will kill you under plausible deployment scenarios, and the official documentation has copious explanations of What the settings do and almost nothing regarding Why or How you should configure them.

Hopefully, this will save you, brave Googling blog owner from the future, from having to figure this out by trial and error while your server is down.  Godspeed.