Um, Whoops.

The $7,500,000 finger.

Hello.. how’s your morning going?

I hope it’s been a little better than mine.

We had a teensy eensy weensy little billing error last night… my first clue something was up when I saw this morning’s daily billing report (so far): $7,500,000.

It turns out due to my excessively fat fingers, nearly every one of our customers has been seriously over-billed in the last 12 hours.

I bet when you read this part of the last newsletter:

4. New Office!

Another important thing I’ve been doing instead of writing newsletters
is looking out the window of our NEW OFFICE:

http://www.dreamhost.com/dreamscape/2007/12/21/were-so-high-right-now-you-dont-even-know

If your next web hosting bill from us is mysteriously tripled, now you
know why.

.. you thought it was a joke!

Ha, the joke is on you! I guess. Um, okay, no, not really, I’m sorry.

How on earth could something like this happen?

Let Me Explain

A couple of weeks ago, just around new years, we started beefing up some of our internal “controller” servers. These are the machines that run all of our “behind-the-scenes” services; things from adding a user to registering a domain to configuring apaches to rebilling customers.

I was on a little-bit-too-long vacation, but when I got back, I noticed our daily credit card payments seemed a tad low in the new year.

So, late last week I tried re-running the billing services for all the days back three weeks or so. I knew this was safe, because after 10 years, the one thing you DO get perfect is your billing system. Our biller is pretty bug-free and robust at this point, because we’d be broke and eating bugs if it weren’t.

In fact, it’s so robust you can just run it on any day you want, and it’s safe. It won’t double-charge people and it’ll even automatically find any missing charges and catch everything up to the day you said.

Anyway, I ran it, and things were fine.. and sure enough, it caught a lot of missed payments. I didn’t have time to look into it right then, but I made a note to myself to check up on it on Monday (yesterday) and see if things were fine or still messed up.

And a terminal case it is.

Come Monday

Monday came. I checked the reports and sure enough, things were still pretty low. So I looked at the logs for some of the biller services, and I noticed they were only failing on the machines that had been recently upgraded!

That explained why we were getting some money still (since not all the controllers have been upgraded yet), but not all of it.

Anyway, it turned out there was no 64 bit version of the PFProAPI module we use to interface to the credit card transaction server. No big deal, there’s a new module that interfaces with their new and preferred https interface, and it was only a couple of lines of code to change to get us switched over!

So anyway, I made the change, and it worked, and I even tested it, and things were fine!

But then… late last night, I realized: when I re-ran those biller services last week, they must not have fixed everybody then either! It’s just that by running it again I randomly got different people being charged on the working controllers who had been assigned an upgraded (and therefore broken) one before.

So why not just run it all one more time?

Sure, it should be no problem! So I did, manually running the biller (which is normally automatically scheduled) for 2008-01-14, 2008-01-13, 2008-01-12, 2008-01-11, 2008-01-10, 2008-01-09, 2008-01-08, 2008-01-07, 2008-01-06, 2008-01-05, 2008-01-04, 2008-01-03, 2008-01-02, and 2008-01-01.

I probably should have just stopped there. But then I thought better. I thought to myself, “When did we start upgrading these controllers anyway?”

I couldn’t remember. But, since the biller is super-safe and robust anyway, I went ahead and ran it for 2008-12-31, 2008-12-30, 2008-12-29, 2008-12-28, 2008-12-27, 2008-12-26, and 2008-12-25, just for the hell of it.

Notice Anything?

Don’t feel bad if you didn’t. I kind of missed it myself.

THOSE SHOULD HAVE BEEN 2007!!

Heh, uh.. um, er.. my bad?

So what happened?

Well, that super-robust and stable biller did what it was programmed to do, it ran as though today was December 31st, 2008!

And what did it see? Well, it saw a whole lot of accounts (essentially all of them) who for some unknown, mysterious reason hadn’t been charged at all for eleven and a half months!

So off it went, busily through the night, “fixing” everything up for “today”, December 31st, 2008.

Really, it’s sort of amazing this never happened before in the last ten years.

We have a NEW SUPPORT RECORD!

There IS a bug here.

I can imagine the half second or so of thought that sprinted through the programmer’s mind when he was adding the ability to allow you to pass in what day to run the biller as though today is:

Hmm.. well, I could see us POSSIBLY wanting to be able to bill for a future date.

Well guess what… NO! We will NEVER want to rebill as though today were a day that hasn’t happened yet! But instead, somebody along the line (Sage? Me? Somebody else?) figured, “What’s the harm in keeping it flexible?”

About $7,500,000 in harm, that’s what!

The serious part.

The end to this story is that of course, I’m very very sorry, we’re very very sorry, and I’m sure you’re very very sorry this happened. I really am. I understand the sort of problems that an unexpected large charge to your credit card (or worse yet, your debit card) can cause. If the tone of this blog post seemed a little light, I apologize I don’t mean to offend and I realize how serious an issue this is. I’ve been up since 3:50am trying to undo the damage and maybe I’m a little shell-shocked.

A new service is running right now (in parallel on all the controllers) that fixes all those future charges, re-enables your account if it was erroneously suspended, and if your credit card was automatically rebilled, refunds the payment automatically. You don’t have to contact us or your bank, and you’ll get an email when your account is finished fixing up. It’s going to take several more hours to complete. There are (or were, after this incident) a lot of you these days!

If, because of this billing mistake, you somehow incurred some fees from your bank or credit card company, please let us know after tomorrow (today we are just replying to all 10,000+ billing messages with a generic explanation) and we’ll do our best to make it right for you.

And of course, the biller no longer allows dates in the future.

The moral of this story is that “flexibility” is rarely desired in programming! The less a program will accept/the less a program will do/the less options and preferences it has, the more usable it is/the more understandable it is/the more stable it is.

Tough Love

I wouldn’t want him to compile me!

When designing a program, you’ve got to make some tough decisions .. and when you really can’t decide if this is something your users will need someday, err on the side of leaving it out.

Otherwise, your users will someday err on the side of your face.