Sorry, not available in this language yet

English
日本語

Securing password digests, or how to protect lonely unemployed radio listeners

Jun 10, 2012 / 11 min read

Table of Contents

A few caveats
What we learned Storing passwords in the clear is unacceptable, as best practice states
Digesting stored passwords is not itself a best practice
We now know, personally, how a stolen HASH(PW) *feels*: It violates user trust
Attackers can determine what HASH() scheme used from only digest data
Hash dictionaries exist online
A bit of threat modeling
[T1] – Internet-based attacker – Access to the app
[T2] – LAN-based attacker, access to database
How we fix it
Quick best practices check

As we’re prone to say, “much ink has been spilt over the release of password digests” from LinkedIn and others. I’m, as is typical, profoundly disappointed in that amount of misinformation I’ve heard in security folks’ commentary on the problem and the underlying workings of digests, HMACs, and so forth. This blog entry represents a roll-up of a great discussion we had internally on our software security group mailing list.

A few caveats

We do not recommend only a digested password (salted or otherwise, MAC’d or not) as a best practice for password storage. Consider this work instructional only.
Hardening a salted hash introduces properties to a system which in turn require threat modeling for respective weaknesses. You may quickly find it’s turtles all the way down.
Digested passwords, salted or otherwise, create a non-trivial maintenance problems (on compromise, rotation, etc.) for AuthN administrators because of their non-reversibility.
An app’s threat model may demand focus on many other aspects of AuthN design prior to password storage.

This entry treats the issue and its mitigation more thoroughly than press / blogs have for purposes of secure design instruction. Other entries leave developers with a lot of questions and opportunity to mess things up.

What we learned Storing passwords in the clear is unacceptable, as best practice states

This latest example reinforces what we know: Whether through injection or other means, attackers regularly succeed in bulk exfiltration of sensitive data.

Digesting stored passwords is not itself a best practice

This resonates immediately with many security practitioners. However, some organizations for which we do assessments still push back on this. Policy varies by organization, but if you’re committed to digesting with hashes, best practice entails:

Use a digesting function currently not known to suffer from collisions.
Use such functions securely so as not to weaken its attack resistance; Specifically, prevent length extension vulns, Salt(*1) data to be hashed (PW) to avoid duplicate detection, and avoid salt/src-data collisions.

Through this post, I present a solution that meets these requirements.

We now know, personally, how a stolen HASH(PW) feels: It violates user trust

The first thing we all did was (1) change our passwords and (2) tell everyone we know to reset THEIR passwords. I recently explained to a customer, “When an individual’s password is stolen, you’ve often compromised their credentials at several sites. How many people have a single password that they use for several ‘low risk’ sites? When we deploy an application, we have a (metaphorically) fiduciary responsibility to protect users’ credentials.”

Attackers can determine what HASH() scheme used from only digest data

We quickly knew we were dealing w/ SHA1() unsalted. After an assessment, some application owners will indicate “Even if they could steal our HASH(PW) database, they couldn’t apply a rainbow table because they wouldn’t know our protocol/scheme.”

Obscurity provides some value to raising the difficulty of reversal, as long as it doesn’t weaken the cryptographic properties of a solution. It doesn’t always take a cryptanalyst (or even a security consultant) long to figure these things out. Attackers of public systems, such as LinkedIn, have access to freely create their own account (or accounts). With this access an attacker can craft their own PW,HASH(PW) tuple by creating / hacking their own account. This means they have a ready-made test to help prove their hash scheme hypotheses.

Hash dictionaries exist online

If you’re having trouble with the previous step, Google the recovered hashes, and you’ll see hits for MD5 and SHA1. These dictionaries can be used to bootstrap a salt-specific rainbow table.

A bit of threat modeling

Before we proceed towards a solution, let’s define rules of engagement by considering a fixed set of threats and attack vectors we’ll protect against. (T = Threat, AV = Attack Vector)

[T1] – Internet-based attacker – Access to the app

[AV1] – Inject database, lift entire table of HASH(PW)
- Studies show that the majority of password lifts also provide UNs
- Attacker looks for DUPS in HASH(PW)
[AV2] – Observe single user’s HASH(PW), attempt to reverse
- T1 can see the victim’s material AS WELL AS their own
[AV3] – Impersonate login by using hash attack
- Presumes T1 access to surface where hash computed/present
- Replay stolen from AV1 or AV2 or
- until collides with what observed through AV1 or AV2
[AV4] – PRNG attacks
- Attacks may DoS the app (not just random) by causing the PRNG to block on more entropy
- Improper use of PRNG may lead to cryptanalysis or exploit

[T2] – LAN-based attacker, access to database

[AV1] – Pul export of material through load/reporting interfaces

These vectors will help us couch the remainder of the discussion.

How we fix it

I spent some time to hack up a prototype solution, in Java, and share it here because I can’t find a good (concrete) example without limitations with regard to my stated best practices above. Two things are interesting:

It’s just not that much code.
Even without rolling our own crypto, getting that code correct is surprisingly subtle.

Look at the solution’s simple form:

[STORED_FORM] = [SALT] + HASH ([SALT] + [PW])

Two issues remain: How do we:

Specify and scope the [SALT]?
Avoid hash() length extension attacks? (*2)

[SALT Spec]

Specifying a single salt at the scope of the whole password database provides **NO** security under T1.AV1, AV2, or AV3. This conclusion holds regardless of salt size. Why? Because T1.AV3 presumes access to a single salted hash, which gives away the prefix needed to compute your new app-specific rainbow table. Put another way: Using the same salt for all passwords defeats the salt’s only purpose (de-duping cipher-/digest-texts). See (*1).

***Also remember: in case (T1.AV2) (a threat gets access to a victim’s digest) they will have to create a custom rainbow table to break reverse the hash, but creating this rainbow table is NO more difficult than creating one for an unsalted hash (because the salt is known).

So, do we need to protect the storage of the salt? That is, do we need to store it in a separate database table, or outside the database entirely? This fret falls into the category of obscurity: an attempt to slow an attacker down. Again, see (*1). There are more elegant ways to protect a bulk digest lift (T1.AV1) from reversal anyways.

[hash() length extension attacks]
HASH() length extension attacks have been covered (*2). Summarizing, we need to make sure that attackers can’t craft a conforming hash along particular (often internal or MiM) surfaces (as described by T1.AV3). Various authors describe the generic fix to this as:

[DIGEST] = HASH([KEY] + HASH([KEY] + [PW]))

The easier form of this is simply:

[SALT][DIGEST] = HMAC([KEY] + [SALT] + [PW])

I have chosen the latter construct in my implementation.

There are subtle and interesting differences between a solid hmac() implementation and the form above it. For the purpose of this entry, let’s continue confident that the chosen hmac will in fact (1) provide an inner and outer hash() (2) that construction controls length attacks, and (3) that same construction is designed to leak no information about the key through the digest-text.

This selection also means our implementation does not have to chain hashes together (*3). While hash chaining is a kind of obscurity that will provide value without negatively affecting other cryptanalytic properties, it also has a performance impact. I vehemently resist such design additions where alternatives exist.

[Misc Design Properties]

In order to conform to best practice and resist all the stated attack vectors, the design need a few more pushes:

Use collision-resistant crypto functions.
- HMAC() algorithm == SHA256
- Salt PRNG == SHA1PRNG
Avoid salt/data collisions. The implementation fixes the salt size and uses a PRNG so that:saltX + dataX != saltY + dataY
Some implementations will fail in the following way:
SX = 123
DX = 45
SY = 12
DY = 345
Reuse PRNGAs. Amit points out in (*4): watch for how greedily apps use randomness. Having chosen a largish SALT the design comes to rely on the SALT-size to ensure that creation of a comprehensive rainbow table (for ALL users) would be impossibly difficult. (Please refer to the “***Also Remember” above regarding targeted attack.) That implementation referenced below chose an arbitrary PRNG reset interval of 512 password creations. This may be too often, too infrequent, or just right. The answer depends on the emission rate of quality entropy from /dev/srand.Production software I’ve written has locked machines in test because they’re out of entropy (T1.AV4). This doesn’t just block crypto functions: I’ve seen it halt response to HTTP requests: not cool.
Follow Application Security Best Practices (hygiene).
- Explicitly call out both algorithms and providers
- Protect underlying digest representation (encapsulation, zeroing them out where possible, etc.)

Quick best practices check

At this point, we should have met our best practices and resisted the stated attack vectors. With only a limited amount of time to consider the solution/code, several great points sprung up internally that I’ll share.

[HMAC creates a Key Protection Problem]

Son-of-A… The referenced implementation does nothing to protect the HMAC secret key. Ultimately, this is because I whipped the code up in a few hours–it’s a prototype. I would fix this in a real fielded solution. In the meantime, a few mitigating factors:

The HMAC key, if stored only server-side, in an application container, it remains separate from the database itself, and will therefore not be exfiltrated by a SQLI attack. This keeps it separate and protected as long as its not compromised by some sort of stack trace / error handling problem within the Java code. In other words, T1.AV1, AV2, and AV3 should not compromise this key. In order to retain key secrecy under T1.AV4, the DB-ADMIN must be separate from the APPSERVER-ADMIN.

Thus, it does its job of introducing non-reversibility to the digest, by rainbow table or otherwise. Look at the key itself: it could stand to be longer and more diverse (character-wise) in order to resist brute-force discovery.

Without further modification, the HMAC continues to do its job of length-extension attack prevention.

What if design demands client-side digest ‘expose’ HMAC(K1) to attackers (under T1.AV2)? One might argue this still protects from MiM observation of plaintext credentials in cases where SSL/TLS fail (but not for public sites where anyone can become T1). The net of this circumstance is that the solution sacrifices the reversing hurdle of HMAC(k1) and backs out to:

Reversibility(O) = O(salt + hash())
Rainbow Table(Size) = NumberOf(Users)

…which isn’t bad but again isn’t best practice. It’s where we ‘started’. Ultimately, deploying an HMAC() solution to the client, for this reason, is probably unneeded complexity (See initial caveats plz).

In any case, please queue the music on additional AppSec hygiene: we’ll need to buy some “secure key storage”, “secure key backup”, and “key rotation”.

[HMAC creates a Key Rotation Problem]
Technically, this problem isn’t “unique” to the HMAC; plain hash digests possess it as well. The points are these:

When an attacker compromises the scheme, having hard-coded the key material into the application means I must re-generate keys, recompile, build a deployable unit again, and promote. This is VERY bad.
Regardless of scheme used, administrators and developers need a way of dealing with the existing (compromised password database and a rotated shame/set. Without scaffolding, this will require custom code (see above bullet) and one-off operational behavior. This is also VERY bad.

I’ve grown to hate this HMAC thing

Fair enough. As alluded to early in the entry, adding security features can add complexity. Replace HMAC() with:

SALT+DIGEST = SHA256(NONCE + SHA256(NONCE + SALT + DATA))

We have to re-consider the threat model and applicable best practices AGAIN if we adopt this new solution. Let me provide some cache hits rather than the full explanation:

If passwords are of fixed max size (<= 64B) you need not provide additional structure to avoid stated SALT / DATA collision attack. This is however tricky as some systems check password size only on user registration, not login.
Length extension should be avoided b/c DATA only affects the 1st HASH round and is thus normalized to fixed output size (64B).
If you’re going to play w/ the hash function then you should size the SALT to be at least as large as the hash’s block (not output) size. Using this rule on SHA256 (my chosen HMAC) one would chose a 64B salt rather than my 32B. This is, in my understanding, is import only to hash functions avoiding extension attacks and is an unnecessary constraint on HMACs.
This solution exchanges the reversing hurdle of HMAC(k1) with NONCE, protect it like HMAC(K1) above… or back out to:
Reversibility(O) = O(salt + hash())
Rainbow Table(Size) = FunctionOf(, , and sizeOf(data))
This scheme as specified does not resist cryptanalytic attack as well an HMAC().

Why? cKetkar explains:

HMAC masks your key during the first — inner — hash with a fixed constant (ipad). Then on the second — outer — hash it masks your key again with a different fixed constant (opad).
The masking operations result in a different inner and outer key value, and the entire process effectively seals your message, hides your key, and makes it impossible to tack new data on the end. Also note that HMAC specification specifies values of ipad and opad to MAXIMIZE the Hamming distance between the resulting (K XOR ipad) and (K XOR opad).

So the detailed HMAC form is –

H( (K XOR opad), H( (K XOR ipad), M) )

We could add in XOR operations into the HASH I’ve suggested above but there’s too much to get wrong/right there. Just use a HMAC or live with what I specified. This quickly resolves to “don’t roll your own crypto”.

I’ve grown to hate this HMAC thing

Some bonus teasers:

Consider the purpose of HMAC(Key) and Salt. Are both necessary?
What scope must be applied to HMAC(key) if we remove the Salt?
How do previous questions affect Threat Model and AppSec hygiene?
How might you design a digested password storage scheme to handle rotation (of algos, keys, salts, etc.)
And, of course: Isn’t it amazing we’re still messing around with protecting 1970’s password storage schemes?

Hopefully more to come on that last one.

Code

If you’d like to play around with this code, you can get it here:

svn checkout http://code-poetic.googlecode.com/svn/trunk/java/digesting pwDigester

- You will want JUnit 4.x
- J2SE 1.5+
- You will need the SUN crypto provider for this to work out of the box. Windows users season to taste.
- I actually had the flu when I wrote this blog/code. Mileage may vary

This code is far from perfect and far from a ‘complete solution’. I’ve of mixed-mind on the effort. 1) It took me about 3 hrs to write the code (about 2.5 hrs to write the email thread & about 1.75 hrs to blog it). So, this is less than a day’s work, clocking in at about ~384 LoC of commented code. From this perspective, it’s impossible to defend not doing it incorrectly–there’s simply not enough work for it to be prohibitive.

…on the other hand, a lot of ‘know-how’ went into that code and (again, it’s not great). Amit’s work, Chandu’s work, Del’s commentary, and my prior work all had a hand in that small code base. My hope is that the process by which I decomposed the problem and the resulting design/implementation will be instructive and will help the reader get beyond the often quoted solution, “Oh it’s simple”…

-jOHN

(*1) – Salt : The purpose of a salt in a cryptographic or hash function is to cause two identical plaintext to appear different in ciphertext form that that duplicates can not be detected. This is particularly useful in password digest schemes, as users may use the same (lame) password as one another “12345”, “linkedInSuX”,…) In order to apply other attack resistances please look up cryptographic nonces, padding, hmacs, and prepare for a long rabbit hole.

(*2) – Length extension attack

(*3) – while (i++ < 1000) { digest = Hash(digest); } : https://www.owasp.org/index.php/Hashing_Java

(*4) – Use of SecureRandom : Proper use of Java SecureRandom