| ||
|
This community works best when people use their real names. Please
register for a free account.
Other Groups: Joel on Software Business of Software Design of Software (CLOSED) .NET Questions (CLOSED) TechInterview.org CityDesk FogBugz Fog Creek Copilot The Old Forum Your hosts: Albert D. Kallal Li-Fan Chen Stephen Jones |
MD5 and SHA1 are great for long chains of bytes such as files, but what if I want to hash smaller number of bytes. For example 128bits... You will ask why I even bother? Well, because I don't want to use the actual number and I don't want to be able to go back to the original 128-bit number from the hash. Are there any hash functions optimized for small number of bytes like this? Or can the existing ones be scaled down?
I'm guessing you're hashing passwords here, and so want the hash function to be computationally hard to invert with a low chance of collisions? If that's the case best stick to something reasonably crpytographically strong and tested. I think the old 'crypt' function used to encode passwords on unix produces quite a short hash, it's not considered strong enough these days though. I'd stick to SHA1 or MD5.
Matt Tuesday, May 31, 2005
It is kinda like what Matt said. I was thinking of using RIPEMD-160, but that didn't seem like a good thing since I will be going from 128bits to 160bits. That means I need more storage (which is fine if it is absolutely necessary). The other thing that got me thinking is that, my known universe of numbers in this case is 2^128 (fixed size). I will be mapping these into a universe of 2^160. I am no expert in hashing functions or crypto, but from what I understand, there should be no collisions really since there are more numbers available in the hash itself than the original number. However, doesn't this mean that it is easier to work backwards since there really is no loss of information which is the case if you have a very long stream of bytes used to generate the hash? CRC or checksum would probably work as long as I can prove that it is hard to work backwards from it. If I could go from 128bits to -say- 32bits, that would probably be good enough, but I don't want to create my own checksum function here. I need it to be cryptographically strong so that the mapping from 128-bits to 32-bits (or whatever) is well distributed, random, etc.. Any ideas where I can look?
CRC or checksum are useful for checking data integrity but aren't at all hard to invert. By 'hard' we mean 'the best known algorithm takes a powerful computer a prohibitively long amount of time', of course. Hash functions that are described as cryptographic are generally the ones which are hard to invert in this sense (amongst other desirable properties). Does the attacker have to get back to the exact same original string? or just obtain some arbitrary string that will produce the same hash? 'Invert' is generally used to mean the latter. If you're comparing hashed passwords then they only need to find something else that'll hash the same, it doesn't actually matter what the original password was once you've hashed it and thrown it away.
Matt Tuesday, May 31, 2005
The hash result is usually defined as a stream of bits, transformed to hexadecimal on output. If you have a 128 bit result and want only 32, simply omit any 96 bits and use the remaining 32 in any order and combination. Or you can cut it into 4 32-bit units and add them together. The internal hash calculation is already performed with 32 bit unsigned longs for the "lower" hashes. If you implement them yourself, you can output them any way you like, losing any amount of bits. Hashes like Tiger and SHA-512 use 64 bit arithmetic. Be aware, however, that you should only manipulate the end result this way, not something within the calculation. Even simple changes can completely break the well balanced security. And be aware of the birthday attack when chosing the required number of bits.
Secure Tuesday, May 31, 2005
> Does the attacker have to get back to the exact same > original string? Matt, Yes. Anything short of the original string is useless. > If you have a 128 bit result and want only 32, simply omit > any 96 bits and use the remaining 32 in any order and > combination. Or you can cut it into 4 32-bit units and add > them together. Secure, I would imagine this approach would not work well. AFAIK, the way the hash algorithms work is that the output strings have very small likelihood of colliding with each other by design. If I use a hash function designed for 160bits in my application which requires only 128bits by somehow eliminating 32bits (adding, dropping etc), I might have a much higher chance of the outputs colliding. Is that not true? Maybe the increase in likelyhood would be tolerable, maybe not. I would need to know this beforehand, but I have no clue how to calculate it.
I'm trying to think of a situation where you'd usefully need to store just a secure one-way hash of a string, but only getting the original string back would be considered cracking it. Are you using it in combination with some other means of verification? I mean, if the hash is used to verify/identify the original data in any way, then it will also verify anything else that hashes the same. And if it's not used for verifying the original data then what is it useful for? I think Secure's suggestion sounds good. If you're worried about the increased likelihood of collisions then don't use a short hash, full stop. If you're prepared to accept the slightly higher risk of collisions of using a 32-bit hash, then dropping some bits from a secure 128-bit hash should be an ok way of doing it, providing that the hash you use is reasonably uniform (which I think SHA is - I don't know the details though).
Matt Tuesday, May 31, 2005
As Secure mentioned, if you google on 'Birthday theorem' in combination with hash functions, you'll find how to calculate probabilities of collisions, given a few assumptions about the hash function you're using (regularity, in particular). Called that after the problem of finding the probability that two people in the room have the same birthday.
Matt Tuesday, May 31, 2005
>I'm trying to think of a situation where you'd usefully > need to store just a secure one-way hash of a string, > but only getting the original string back would be > considered cracking it. The scenario: Both you and I have a piece of information. We want to see if they are the same, but neither one of us wants to reveal the information we got. So, we both calculate the hashes of the information we got and exchange the hashes. Make sense? The hashes need to be secure so that either side can't figure out what the other side's information is. Collisions are of course a problem, so they need to be eliminated or kept to a very minimum.
> As Secure mentioned, if you google on 'Birthday theorem' > in combination with hash functions, you'll find how to > calculate probabilities of collisions, given a few > assumptions about the hash function you're using > (regularity, in particular). I will do that. Thanks.
Ah, I see. The upshot of Birthday is that (with a few assumptions) you need to hash approximately 2^(t/2) randomly-picked values in order to get probability > 1/2 that any pair of those collide, where the number of bits in your hash is t.
Matt Tuesday, May 31, 2005
Both SHA1 and MD5 have been broken. Obviously, they are still quite useful in a variety of applications due to the definition of 'break'. However, there's no excuse for using them in new applications these days. There are other, better, hash algorithms. SHA-512, for example. This, of course, is somewhat academic. For this particular problem, a CRC may even be sufficient.
IANA(expert) but you could look at PC1 http://membres.lycos.fr/pc1/ PC1 is allegedly RC4 and may be busted bigtime by now - if anybody could comment ...
trollop Tuesday, May 31, 2005
hasher masher, exchanging hashes to authenticate that both ends know the password eh? That means that your hash *is* a password, and you are passing it plaintext, right? Look at Diffe-Hellman (DH) key exchange.
i like i Wednesday, June 01, 2005
hasher masher: A cryptographic hash function is designed in such a way that the correlations between any of the bits is kept down to an absolute minimum that can't be used for an attack. That means that any bit has a chance of 50% for beeing either 0 or 1, without any influence of any other bit. Thus the strength of the hash is solely defined by the number of output bits, as long as the function itself is not broken. Additionally, because they are not correlated, you can take any amount of these bits and recombine them in any order, as long as no single bit is used more than once. Then the strength of the result is again solely defined by the resulting number of bits. If you have the full amount of 128 bits, you have 2^128 values with an equivalent security. If you use only 32 bits, of course you can't come above the equivalent security of 2^32 values. If you use them to compare information between different parties, you have a bunch of other problems, of course. E.g. when first A sends his hash to B, you must make sure that B doesn't simply repeat the previously received hash to fake his knowledge of the information. From your problems with the bit extraction I assume that you have not such a deep knowledge of these topics. A word of warning: You will have a high likelihood of producing pure snake oil without this knowledge. There are some useful hints and links in this thread: http://discuss.joelonsoftware.com/default.asp?design.4.134764.22
Secure Wednesday, June 01, 2005
> If you use them to compare information between different > parties, you have a bunch of other problems, of course. > E.g. when first A sends his hash to B, you must make sure > that B doesn't simply repeat the previously received hash > to fake his knowledge of the information. I did read most of the books in the link you posted. Perhaps I should reread them. How do I stop the scenario above?
hasher masher Wednesday, June 01, 2005
> If you have the full amount of 128 bits, you have 2^128 > values with an equivalent security. If you use only 32 > bits, of course you can't come above the equivalent > security of 2^32 values. I don't understand the inner workings of the hash functions. Can I assume that with 160bit output, I have 1 in 2^160 chances of running into collisions, and with 64bit output, I have 1 in 2^64 chances of running into collisions? Something tells me it is not that easy to figure out since I am sure not every single input maps to a single output although in my case the input range is constrained. I am using 128bit fixed-size inputs so there are more bits to represent data at the output than the input. If the hash function is truely random, then I would guess all my inputs should be getting mapped to specific outputs.
hasher masher Wednesday, June 01, 2005
You may possibly be looking for "zero knowledge" authentication protocols. Keep in mind that any homegrown protocol you come up with will probably be vulnerable to simple attacks (for instance, the hash repeat mentioned above which can be initially solved via challenge-response but is still vulnerable to man in the middle attacks). Find an existing, well known, protocol that matches your needs and follow it to the letter.
whatever Wednesday, June 01, 2005
I checked out the "zero knowledge" authentication protocol in Bruce Schenier's book. It sounds like what I need, but not entirely. I thought I could do the following: 1) A sends B its secret, but hashed. 2) B receives A's secret and either rehashes it with its secret as the key (HMAC?), or encrypts (symmetric) what A sent with its own secret. Then sends this new data back to A. 3) If A can get back a hash which matches its own secret's hash from what B sent, then B knows about A's secret. Otherwise, it doesn't. If A and B have different secrets, B can't figure out A's secret from the hash. However, could A figure out B's secret from its own hash and the encrypted version sent by B? I would guess not since that would mean it could break the symmetric encryption. As for the optimized hash, it sounds like I'll have to do the whole thing. It doesn't sound like it is a good idea to use part of the hash output although if the hash output is sufficiently large, I guess there would be relatively small chance of collisions.
hasher masher Wednesday, June 01, 2005
> "I don't understand the inner workings of the hash functions. Can I assume that with 160bit output, I have 1 in 2^160 chances of running into collisions, and with 64bit output, I have 1 in 2^64 chances of running into collisions?" < Exactly, as long as the output behaves purely random and the bits are uncorrelated. I.e. any single flipped bit in the input should flip about half of the output bits, but you can't predict them. BTW, if you have 64 bits, you *will* run into a collision with the (2^64)+1 th value, even when the previous 2^64 were all unique. When you have 11 beans, and you have to put them into 10 boxes, but only 1 bean per box, where will you put the 11th bean when eating it is not an option? > "If the hash function is truely random, then I would guess all my inputs should be getting mapped to specific outputs." < A hash function maps all possible inputs to a limited output. When you have more inputs than possible outputs, then you must have collisions. When you have less inputs than outputs, and they stay within the birthday attack range, then you have a good *chance* of unique hashes without any collision - but only a chance. Hashing 512 bit inputs (64 byte) into a 128 bit hash means that on average any possible hash value occurs 2^384 times for an equal distribution. > "As for the optimized hash, it sounds like I'll have to do the whole thing. It doesn't sound like it is a good idea to use part of the hash output although if the hash output is sufficiently large, I guess there would be relatively small chance of collisions." < Do you have mathematical reasons (for both "no good idea" and "small chance"), or is it based on a "bad feeling"? You know, secure cryptography is all about math. Feelings are the realm of snake oil: "You can feel secure because I say so, trust me." When you "guess" in cryptography, you are DOOMED to produce snake oil. > "I don't understand the inner workings of the hash functions." < > "As for the optimized hash, it sounds like I'll have to do the whole thing." < Putting these two together, what exactly makes you believe you can roll your own cryptographic hash function?
Secure Thursday, June 02, 2005
> Do you have mathematical reasons (for both "no good > idea" and "small chance"), or is it based on a "bad > feeling"? You know, secure cryptography is all about > math. Feelings are the realm of snake oil: "You can > feel secure because I say so, trust me." You already confirmed my suspicion. Let's say my hash algorithm was RIPEMD-128. If I only use 64bits out of 128bits, then starting with 2^64+1, I am running into the possibility of collisions. I don't know what the probabilty of having two hash outputs with identical bottom 64bits and different upper 64bits. I don't know how to calculate it, so I'll avoid it. > > "I don't understand the inner workings of the hash > > functions." > > "As for the optimized hash, it sounds like I'll have > to do the whole thing." > > Putting these two together, what exactly makes you > believe you can roll your own cryptographic hash > function? You misunderstood me. What I meant to say was I will use the output of the hash algorithm as it is without dropping any bits even though it will require more memory. That shouldn't produce any snake oil since I am sticking to the algorithm as it was intended. Any comments on my method of checking if B knows A's secret?
hasher masher Thursday, June 02, 2005
What is your threat model? What are you trying to achieve and what are you trying to protect against? You are being too vague.
whatever Thursday, June 02, 2005
> "I don't know what the probabilty of having two hash outputs with identical bottom 64bits and different upper 64bits. I don't know how to calculate it, so I'll avoid it." < Quite easy. If these 64 bits come from a secure cryptographic hash with expected behaviour as previously told, they behave just like a separate hash with a 64 bit output. I.e. if you have an n-bit hash, you need around 2^n trials to find an input for a given output, and around 2^(n/2) trials to find two arbitrary inputs hashing to the same output. > "Any comments on my method of checking if B knows A's secret?" < So far, so good. I'm no crypto-expert and didn't read too deep into challenge-response functionality. You should answer "whatever"'s questions first: There is no silver bullet, and your method solves some problems, but not all. My advices about snake oil apply to this, too. Security related protocols can just be as hard as crypto algorithms. Use one of the existing and peer reviewed schemes, don't try to roll your own, unless you really have to.
Secure Friday, June 03, 2005 | |
Powered by FogBugz
