As we’re prone to say, “much ink has been spilt over the release of password digests” from LinkedIn and others. I’m, as is typical, profoundly disappointed in that amount of misinformation I’ve heard in security folks’ commentary on the problem and the underlying workings of digests, HMACs, and so forth. This blog entry represents a roll-up of a great discussion we had internally on our software security group mailing list.
This entry treats the issue and its mitigation more thoroughly than press / blogs have for purposes of secure design instruction. Other entries leave developers with a lot of questions and opportunity to mess things up.
This latest example reinforces what we know: Whether through injection or other means, attackers regularly succeed in bulk exfiltration of sensitive data.
This resonates immediately with many security practitioners. However, some organizations for which we do assessments still push back on this. Policy varies by organization, but if you’re committed to digesting with hashes, best practice entails:
Through this post, I present a solution that meets these requirements.
The first thing we all did was (1) change our passwords and (2) tell everyone we know to reset THEIR passwords. I recently explained to a customer, “When an individual’s password is stolen, you’ve often compromised their credentials at several sites. How many people have a single password that they use for several ‘low risk’ sites? When we deploy an application, we have a (metaphorically) fiduciary responsibility to protect users’ credentials.”
We quickly knew we were dealing w/ SHA1() unsalted. After an assessment, some application owners will indicate “Even if they could steal our HASH(PW) database, they couldn’t apply a rainbow table because they wouldn’t know our protocol/scheme.”
Obscurity provides some value to raising the difficulty of reversal, as long as it doesn’t weaken the cryptographic properties of a solution. It doesn’t always take a cryptanalyst (or even a security consultant) long to figure these things out. Attackers of public systems, such as LinkedIn, have access to freely create their own account (or accounts). With this access an attacker can craft their own PW,HASH(PW) tuple by creating / hacking their own account. This means they have a ready-made test to help prove their hash scheme hypotheses.
If you’re having trouble with the previous step, Google the recovered hashes, and you’ll see hits for MD5 and SHA1. These dictionaries can be used to bootstrap a salt-specific rainbow table.
Before we proceed towards a solution, let’s define rules of engagement by considering a fixed set of threats and attack vectors we’ll protect against. (T = Threat, AV = Attack Vector)
These vectors will help us couch the remainder of the discussion.
I spent some time to hack up a prototype solution, in Java, and share it here because I can’t find a good (concrete) example without limitations with regard to my stated best practices above. Two things are interesting:
Look at the solution’s simple form:
[STORED_FORM] = [SALT] + HASH ([SALT] + [PW])
Two issues remain: How do we:
[SALT Spec]
Specifying a single salt at the scope of the whole password database provides **NO** security under T1.AV1, AV2, or AV3. This conclusion holds regardless of salt size. Why? Because T1.AV3 presumes access to a single salted hash, which gives away the prefix needed to compute your new app-specific rainbow table. Put another way: Using the same salt for all passwords defeats the salt’s only purpose (de-duping cipher-/digest-texts). See (*1).
***Also remember: in case (T1.AV2) (a threat gets access to a victim’s digest) they will have to create a custom rainbow table to break reverse the hash, but creating this rainbow table is NO more difficult than creating one for an unsalted hash (because the salt is known).
So, do we need to protect the storage of the salt? That is, do we need to store it in a separate database table, or outside the database entirely? This fret falls into the category of obscurity: an attempt to slow an attacker down. Again, see (*1). There are more elegant ways to protect a bulk digest lift (T1.AV1) from reversal anyways.
[hash() length extension attacks]
HASH() length extension attacks have been covered (*2). Summarizing, we need to make sure that attackers can’t craft a conforming hash along particular (often internal or MiM) surfaces (as described by T1.AV3). Various authors describe the generic fix to this as:
[DIGEST] = HASH([KEY] + HASH([KEY] + [PW]))
The easier form of this is simply:
[SALT][DIGEST] = HMAC([KEY] + [SALT] + [PW])
I have chosen the latter construct in my implementation.
There are subtle and interesting differences between a solid hmac() implementation and the form above it. For the purpose of this entry, let’s continue confident that the chosen hmac will in fact (1) provide an inner and outer hash() (2) that construction controls length attacks, and (3) that same construction is designed to leak no information about the key through the digest-text.
This selection also means our implementation does not have to chain hashes together (*3). While hash chaining is a kind of obscurity that will provide value without negatively affecting other cryptanalytic properties, it also has a performance impact. I vehemently resist such design additions where alternatives exist.
[Misc Design Properties]
In order to conform to best practice and resist all the stated attack vectors, the design need a few more pushes:
At this point, we should have met our best practices and resisted the stated attack vectors. With only a limited amount of time to consider the solution/code, several great points sprung up internally that I’ll share.
[HMAC creates a Key Protection Problem]
Son-of-A… The referenced implementation does nothing to protect the HMAC secret key. Ultimately, this is because I whipped the code up in a few hours–it’s a prototype. I would fix this in a real fielded solution. In the meantime, a few mitigating factors:
The HMAC key, if stored only server-side, in an application container, it remains separate from the database itself, and will therefore not be exfiltrated by a SQLI attack. This keeps it separate and protected as long as its not compromised by some sort of stack trace / error handling problem within the Java code. In other words, T1.AV1, AV2, and AV3 should not compromise this key. In order to retain key secrecy under T1.AV4, the DB-ADMIN must be separate from the APPSERVER-ADMIN.
Thus, it does its job of introducing non-reversibility to the digest, by rainbow table or otherwise. Look at the key itself: it could stand to be longer and more diverse (character-wise) in order to resist brute-force discovery.
Without further modification, the HMAC continues to do its job of length-extension attack prevention.
What if design demands client-side digest ‘expose’ HMAC(K1) to attackers (under T1.AV2)? One might argue this still protects from MiM observation of plaintext credentials in cases where SSL/TLS fail (but not for public sites where anyone can become T1). The net of this circumstance is that the solution sacrifices the reversing hurdle of HMAC(k1) and backs out to:
Reversibility(O) = O(salt + hash())
Rainbow Table(Size) = NumberOf(Users)
…which isn’t bad but again isn’t best practice. It’s where we ‘started’. Ultimately, deploying an HMAC() solution to the client, for this reason, is probably unneeded complexity (See initial caveats plz).
In any case, please queue the music on additional AppSec hygiene: we’ll need to buy some “secure key storage”, “secure key backup”, and “key rotation”.
[HMAC creates a Key Rotation Problem]
Technically, this problem isn’t “unique” to the HMAC; plain hash digests possess it as well. The points are these:
I’ve grown to hate this HMAC thing
Fair enough. As alluded to early in the entry, adding security features can add complexity. Replace HMAC() with:
SALT+DIGEST = SHA256(NONCE + SHA256(NONCE + SALT + DATA))
We have to re-consider the threat model and applicable best practices AGAIN if we adopt this new solution. Let me provide some cache hits rather than the full explanation:
Why? cKetkar explains:
So the detailed HMAC form is –
H( (K XOR opad), H( (K XOR ipad), M) )
We could add in XOR operations into the HASH I’ve suggested above but there’s too much to get wrong/right there. Just use a HMAC or live with what I specified. This quickly resolves to “don’t roll your own crypto”.
I’ve grown to hate this HMAC thing
Some bonus teasers:
Hopefully more to come on that last one.
Code
If you’d like to play around with this code, you can get it here:
svn checkout http://code-poetic.googlecode.com/svn/trunk/java/digesting pwDigester
This code is far from perfect and far from a ‘complete solution’. I’ve of mixed-mind on the effort. 1) It took me about 3 hrs to write the code (about 2.5 hrs to write the email thread & about 1.75 hrs to blog it). So, this is less than a day’s work, clocking in at about ~384 LoC of commented code. From this perspective, it’s impossible to defend not doing it incorrectly–there’s simply not enough work for it to be prohibitive.
…on the other hand, a lot of ‘know-how’ went into that code and (again, it’s not great). Amit’s work, Chandu’s work, Del’s commentary, and my prior work all had a hand in that small code base. My hope is that the process by which I decomposed the problem and the resulting design/implementation will be instructive and will help the reader get beyond the often quoted solution, “Oh it’s simple”…
-jOHN
(*1) – Salt : The purpose of a salt in a cryptographic or hash function is to cause two identical plaintext to appear different in ciphertext form that that duplicates can not be detected. This is particularly useful in password digest schemes, as users may use the same (lame) password as one another “12345”, “linkedInSuX”,…) In order to apply other attack resistances please look up cryptographic nonces, padding, hmacs, and prepare for a long rabbit hole.
(*2) – Length extension attack
(*3) – while (i++ < 1000) { digest = Hash(digest); } : https://www.owasp.org/index.php/Hashing_Java
(*4) – Use of SecureRandom : Proper use of Java SecureRandom