The Final Update

Okay, all the people who had still not gotten their refunds was starting to seem a little weird, so after further investigation yesterday, I think we’ve finally got things completely fixed.

It turns out, there was a glitch in our new PayflowPro.pm that resulted in only the first transaction in a single second actually going through! According to Paypal’s site, that PayflowPro.pm should be just a drop-in replacement for the old PFProAPI.pm… and it did seem to be, after changing two lines everything seemed okay.

However, there was one little difference. The new HTTPS interface requires you to pass a unique id for each transaction, and PayflowPro.pm generated that unique id as follows:

my $request_id=substr(time . $data->{TRXTYPE} . $data->{INVNUM},0,32);

The problem was, we never passed in the (optional) “INVNUM” field.. we had an invoice number, but we passed it in as the (also optional) “COMMENT1″. So, our “unique” request_id was pretty much just the current time (plus whether it was a sale or a credit)!

In my testing this didn’t fail, because I didn’t run multiple transactions in the same second. Also, they apparently still return the same old success code we test for when this happens! But when multiple biller services run in parallel on all our controllers, lots of transactions end up happening on the same second.

The Upside

It turns out of the actually closer to $9,600,000 we thought we mistakenly charged, only actually about 1/4 of them ever _actually_ hit people’s credit cards. Our system thought we charged them, and they received an email receipt, but that was where it ended. It turns out we actually billed “only” about $2,100,000 incorrectly.

The Downside

This bug still existed until late last night (around 4am).. so when we ran our super-refunder script, the same thing was happening. Only about 1/4 of the refunds successfully went through. This resulted in the following situation:

About 9/16th of our customers: weren’t actually billed OR actually refunded.
About 1/16th of our customers: were billed AND were refunded.
About 3/16th of our customers: were billed BUT WEREN’T refunded.
About 3/16th of our customers: weren’t billed BUT WERE refunded. (of course, nobody wrote in about it!)

Anyway, last night we fixed the bug (by passing our invoice in as INVNUM) and re-ran another fixer that took an actual log of successful transactions downloaded from our processor and cross-referenced everything with our system. This is what it did:

About 9/16th of our customers: marked their bill and refund as $0 amount.
About 1/16th of our customers: left everything alone.
About 3/16th of our customers: redid the refund.
About 3/16th of our customers: redid the charge.

Double checking now, there were no more of those glitches from before, so everything seems okay.

Once again, all the stuff mentioned in the last post still holds true (you may not see the correction on your statement yet, but if you call your processor they should see it coming, for REALs this time), and once again, I’m very sorry about this whole fiasco.

Sincerely,
Josh Jones

P.S. For people wondering how the “robust and stable” rebiller could have created multiple future charges for the same date… I guess I meant “robust and stable” in regards to normal use over the last ten years. It looks like in this case, when multiple instances were running in parallel on a future date, race conditions allowed some multiple charges for the same period to be created. That too should never happen again now that we don’t allow future bill dates.