Megabus: A Study in Guessability

I recently took two trips on Megabus coaches in the northeastern United States (from DC to NYC and back).

For those who haven’t had the pleasure of riding Megabus, they’re a coach company with two distinguishing features:

As far as I know, Megabus tickets can be bought in two ways: via their website, and at the bus stop itself (if a bus has open seats). When you buy a ticket via their website, you get a booking confirmation email containing two relevant codes: an order number and a reservation number.

Order numbers

For the two tickets I bought, my order numbers were formatted as seven character strings.

1
2
AGSMXSA
AGSMX2I

Considering that the first 5 characters are the same (AGSMX), my bet is that this is a simple counting-order increment (AAAAAAA to 9999999). As a result, it’s trivial to guess the order numbers of valid purchases (both in the past and future).

Reservation numbers

My reservation numbers were longer, and were formatted as dash-separated components.

1
2
10-5586-122017-M21R-2100-WAS-NEW
20-6330-010318-M21R-1400-NEW-WAS

Ignoring content entirely, here’s a naive and pessimistic approximation of the pattern:

1
\d{2}-\d{4}-\d{6}-\w{4}-\d{4}-\w{3}-\w{3}

This yields a search space of 10^16 * 36^10, which is unfeasibly large. So let’s break it down.

First of all, WAS-NEW and NEW-WAS are obvious: they’re the origin and destination, in that order. Since a hypothetical attacker would be looking to sneak onto a particular trip, we can assume that they’ll know these¹.

Similarly, 122017/010318 and 2100/1400 should be recognizable: they’re date and time codes, respectively. These can be found on Megabus’s website, and as such are also well-known.

M21R is (very slightly) more subtle. It’s the bus route, plus R for reserved seat. M21 happens to be the DC-NYC route. Megabus puts these numbers on the bus’s front LED panel, as can be seen in this picture:

That leaves only the first two components: 10-5586 and 20-6330 respectively. In the worst case, that’s a guess space of 10^6 — substantially better than the original worst case.

Unfortunately, that’s also where my investigation ended. I couldn’t figure out any pattern or format for the remaining six digits. Other people have posted their Megabus reservation numbers online, and theirs also lack an obvious pattern.

My best guess is that the first section is the ticket number within the order/session, and that the second is some kind of relative timestamp — that would explain why both increase chronologically.

Consequences

Ticket takeover through guessable identifiers

Right now, anybody can modify a Megabus trip online with just two pieces of information: an order or reservation number, and an email or last name.

Emails and last names are not private information. If you post on social media about a ticket you’ve just booked, you’ve given an attacker everything they need to hijack your ticket:

Even if they can’t determine your order number, posting the trip time and other details puts (on average) just 500000 random guesses away from a leaked reservation number. That’s still a lot, but I’m willing to bet that Megabus doesn’t rate-limit the API responsible for booking management²³.

Ticket spoofing through lack of validation

In my experience, Megabus is very lax about actually validating reservation numbers. Over the dozen or so Megabus trips that I’ve taken in the past few years, I can only remember my number being validated once (against a printout of names!). On every other Megabus ride I’ve been on I was allowed to enter the bus after a quick glance at my phone, presumably to check whether the code looks right (i.e., wasn’t obviously for the wrong route, date, or time).

As such, I could have easily modified the PDF to contain a legitimate trip prefixed by a completely bogus 6-digit ID. Nobody would be the wiser, and they wouldn’t even be able to kick me off the bus (in the event of it filling up) without doing an entire roll call.

This is more of a procedural/opsec failure than a real vulnerability, but it’s worth pointing out. Let computers handle validation; they’re better at it.

Conclusions

Don’t use guessable identifiers for transactions. This is especially true if if you allow users to modify transactions with just an identifier and one other piece of public information (like an email address or last name). Use a long and random identifier to begin with, and also require real logins.

Do use QR codes or some other machine-friendly format for identifiers, along with a computerized check-in system. This doesn’t increase the security of the identifier (see above for that), but it enforces validation and makes it real-time (instead of discovering a fake identifier written in a logbook after the bus has already left).

They might not know the abbreviations that Megabus uses, of course. However, these seem unlikely to change and are publicly enumerable. ↩
Currently, anyways. ↩
And, as always, remember that guessing attacks can happen at scale: 5 guesses per second is slow, but 5 guesses per second across a 100-node botnet means a 50% success rate in just over 15 minutes. ↩

ENOSUCHBLOG

Programming, philosophy, pedaling.