More Passwords

Here is The New York Times talking about frequent passwords. And here is TechCrunch, talking about the same incident. The basics of the situation; a site named RockYou did pretty much everything wrong in handling user passwords, causing other people to get hold of all the user data for 32 million users. And a company called Imperva analyzed the password data to see what the most common passwords are.

From my point of view, the best thing either of the articles has to say is the TechCrunch line "Of course, it doesn't really matter how awesome your password was with RockYou, since they stored it all in clear text and lost control of the data." This is why using different passwords at different sites is important. Sturgeon's revelation applies to programmers: 90% of them, not that good. And that includes 90% of the people storing your passwords. The strength of your password only matters if it is actually secret.

Sociologically, looking at the Times coverage, we can see that the RockYou users are young and sweet (or Imperva sanitized their data), since the names in the top 32 are relatively recent popular names and there are no obscenities. More to my recent point, a number of them aren't in the 500 worst password list; the best scoring of those is "rockyou", for obvious reasons, but "babygirl" (#13), "lovely" (#16), "chocolate" (#28), and "FRIENDS" (#32, which I looked for in lowercase) are all missing. (The Times and Imperva's own report differ on capitalization of some passwords; the 500 worst password list has clearly been flattened to all lower in any case.) Again, 1) Twitter didn't make their list of common passwords based on their own data, 2) the lists differ for different sites and times and 3) avoiding passwords that are on a list is not a helpful tactic.

But also: dudes, why are we blaming the users, who signed up for a free service, instead of the people who were presumably paid to do a professional job of securing those passwords? Because using "123456" as a password is a pretty rational choice compared to pretty much any of RockYou's choices, starting with keeping the user passwords in a database directly accessible to the Internet, and continuing through a very long line of very well understood things you shouldn't do right through to the point where all their user data was stolen, they knew it, and it took them 10 days to mention it to their users.

As for explanations as to why people still choose simple passwords, I love this explanation which amounts to "Why on earth shouldn't they?" Believe me, the lock on your average desk drawer is the locksmithing equivalent of "12345", so it's not like this is an entirely electronic issue, either.

The NYT article suggests two less radical reasons; it could be genetic, or it could be because we have a growing number of passwords to deal with. The argument from genetics is clearly just an attempt to point out that this is what humans have always done. As for the growing number of passwords, that's difficult to follow: passwords have always been bad, they are still bad, the number of passwords we have is getting worse, so they're related? I know correlation doesn't imply causation, but that doesn't mean that lack of correlation implies causation. In fact, passwords are getting better, not worse, but as far as I can tell not because people like to pick better passwords. In days of yore, you could use "a" for your password. (People knew you shouldn't, but you could, and they did.) RockYou might not be bright enough to hash a password, any password, but they knew enough to make you pick one at least 5 characters long. In fact, it's pretty interesting that the most frequent password is "123456", when RockYou only required 5 characters. Why add the 6?

I can think of three reasons:

  1. The sheer number of passwords people have to deal with is making them smarter about passwords. It certainly makes more sense than using it to explain why passwords haven't gotten better.
  2. People are reusing passwords from sites that have a minimum length of 6, which is way more common than a minimum length of 5. Sites with 6 character limit often see repeated strings of 8 characters, which tends to suggest that this sort of effect happens. (See for instance this uncensored and therefore obscenity laden multi-incident analysis where no source had a length restriction higher than 6, and "11111111" is still a top 500 password.)
  3. The RockYou data is known to contain two kinds of passwords; passwords for RockYou itself and passwords for other sites. The number 32 million is the number of RockYou passwords that were exposed, so there is a strong implication that it is those passwords that were under analysis. But if all the passwords were analyzed, that dataset includes multiple passwords for the same users (in which case it's not surprising that there are so many repeats; users are known to reuse passwords), and it includes passwords for sites with a minimum length of 6. That would give 123456 a strong bump.
I'm not certain how you'd design an experiment to tell the difference between the first two hypotheses. Maybe the people who use "xxxxxxxx" when they could be using "xxxxxx" really do it because they know 8-character passwords are better than 6-character passwords, and that's just the only fact about passwords they ever picked up? Maybe all recent password improvements come from people being forced to pick better passwords often enough that they reuse strong passwords that were forced on them someplace? Who knows. But people do appear to, relatively often, use passwords that are stronger than they absolutely have to be. Fully 30 of the top 32 passwords are stronger than RockYou forced them to be; two mix letters and numbers; 3 are more than 8 characters long. So people are not just conforming to RockYou's minimums, and they are, presumably intentionally, picking "stronger" passwords.

Back to that 5-character limit: The Times shows "0" as the 24th most frequent password. That's not a password a RockYou user could set, so something's funny there. The database may have contained "0" entries, as a result of any number of kinds of database foolishness. (Most of them would lead you to the conclusion that RockYou wasn't all that bright about keeping passwords in SQL, but we knew that already.) Or that may have been "00000", "000000", or "00000000" before something helpful somewhere formatted it (this clearly happened to the 500 most common password list). Or, in fact, both kinds of errors may have happened at some point, and that could be all the entries with any number of 0s. But you shouldn't go about assuming that lots and lots of people have voluntarily and successfully selected "0" as a password.