Hash Functions (Hash Algorithms)

A "Hash function" is a complex encryption algorithm used primarily in cryptography, and is like a shortened version of full-scale encryption. 

Hash vs Encryption

Encryption is a broad term, while a hash algorithm is just one of the many encryption schemes.

Encryption - the process of converting information from its normal, comprehensible form into an obscured guise, unreadable without special knowledge.

Hash - a special form of encryption often used for passwords, that uses a one-way algorithm that when provided with a variable length unique input (message) will always provide a unique fixed length unique output called hash, or  message digest.

Hash Collisions

a collision is when two different messages result in the same exact hash.  Hash algorithms are written to avoid collisions, but some, such as MD5 - have been shown to have collisions.

A Hash Example 

Website User Registration and subsequent Login

  1. a user goes to a website and clicks a button that says "New User Registration"

  2. unknown to the user, his browser has downloaded the Hash algorithm as Java code which begins running in the computer memory

  3. when he types in his user ID it is not encrypted - but when he then types in a password (a short, message) - the Hash JAVA routine encrypts it (into a longer, "message digest" - or hash)  - so for this example, he types in his password "mypass" but before it is sent to the web server, it is encrypted by the JAVA Hash algorithm running on his machine as a hash:  "5yfRRkrhJDbomacm2lsvEdg4GyY="

  4. the web server and stores the hash (not the original message) in a database as "5yfRRkrhJDbomacm2lsvEdg4GyY=" - IMPORTANT:  the web host never sees the actual password, but stores only the hash in it's database !!

  5. the nest time the user connects to the site - he types in his ID and password ("mypass") which is converted by the JAVA routine to "5yfRRkrhJDbomacm2lsvEdg4GyY=".  The server compares "5yfRRkrhJDbomacm2lsvEdg4GyY=" to the message stored in it's database - it matches and the user is granted access.  Since the server only stores the longer, encrypted message - it NEVER has to decrypt anything !!!

    NOTE:  if he typed a wrong password, such as "mypass1" - an entirely different message would be created and would not match the message on the server's database and he would be blocked.

    IMPORTANT - How this protects the system from un-authorized Users logging in : if an individual somehow intercepted the "password" as it was being sent to the server, or somehow got access to the server database - all they would have is the hash (5yfRRkrhJDbomacm2lsvEdg4GyY=), and not the password (mypass).  So then they connect to that website, and are prompted for a login and password.  They will only get access if they type "mypass" - but all they know is the hash - not the actual password !!!  Even if they manage to view the JAVA code and see the exact algorithm that converted the password to the hash - it is very difficult, if not impossible - to reverse the process and find the password from the hash (see Example for full details).

Hash algorithms take a long string (or message) of any length as input and produce a fixed length string as output; not all such are suitable for use in cryptography. The output of is sometimes termed a message digest or a digital fingerprint. Tne term "hash" is derived from the breakfast dish, since it is comprised of a bunch of miced up pieces of food:

hash (hăsh)

  1. A dish of chopped meat, potatoes, and sometimes vegetables, usually browned.
  2. A jumble; a hodgepodg
  3. A mess: made a hash of the project
  4. to chop into pieces; mince

*** see also

 

SHA (Secure Hash Algorithm)

NIST supports five hash algorithms called SHA, for generating a condensed representation of a message (message digest).  The five algorithms are  SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512.), and they are detailed in FIPS 180-2When a message of any length < 264 bits (for SHA-1 and SHA-256) or < 2128 bits (for SHA-384 and SHA-512) is input to an algorithm, the result is an output called a message digest. The message digests range in length from 160 to 512 bits, depending on the algorithm.

MD5 (Message-Digest algorithm 5)

*** RFC 1321

MD5 is a widely used message digest algorithm (aka, cryptographic hash function) with a 128-bit hash value. It is not merely a checksum generator, though the term is sometimes imprecisely used. It is one of a series of message digest algorithms designed by Professor Ronald Rivest of MIT. When some analytic work indicated that MD5's predecessor, MD4, was likely to be insecure, MD5 was designed in response, in 1991. This indication was subsequently confirmed when weaknesses were found in MD4 in 1994 (Dobbertin, 1998).

MD5 has been widely used, and was originally thought to be cryptographically secure. However, work in Europe in 1994 uncovered weaknesses which make further use of MD5 questionable. Specifically, it has been shown that it is computationally feasible to generate a collision, that is, two different messages with the same hash. Unlike MD4, it is still thought to be very difficult to produce a message with a given hash. In 2004, a distributed project with the name MD5CRK was initiated to demonstrate that MD5 is insecure by finding a collision. Because of these concerns, many security researchers and practioners recommend that SHA-1 (or another high quality cryptographic hash function) be used instead of MD5.

MD5 hashes (or message digests) are commonly represented as a 32-digit hexadecimal number. A sample looks like this (using characters 0-9, a-f):

34048ce4cd069b624f6e021ba63ecde5
The MD5 hash (sometimes called md5sum, for MD5 checksum) of a zero-length string is:

d41d8cd98f00b204e9800998ecf8427e

RIPEMD-160 (RACE Integrity Primitives Evaluation Message Digest)

RIPMD-160 a 160-bit message digest algorithm (and cryptographic hash function) developed in Europe by Hans Dobbertin, Antoon Bosselaers and Bart Preneel, and first published in 1996. It is an improved version of RIPEMD, which in turn was based upon the design principles used in MD4, and is similar in both strength and performance to the more popular SHA-1.

There also exist 128, 256 and 320-bit versions of this algorithm, called RIPEMD-128, RIPEMD-256, and RIPEMD-320, respectively. The 128-bit version was intended only as a drop-in replacement for the original RIPEMD, which was also 128-bit, and which had been found to have questionable security. The 256 and 320-bit versions diminish only the chance of accidental collision, and don't have higher levels of security as compared to, respectively, RIPEMD-128 and RIPEMD-160.

RIPEMD-160 was designed in the open academic community, in contrast to the NSA-designed algorithm, SHA-1. On the other hand, RIPEMD-160 is a less popular and correspondingly less well-studied design.

 

Wikepedia Crypto HASH Links

 

Previous Example detailed - one-way Hash Encryption of a Password

This scenario is a perfect candidate for "one-way hash encryption" also known as a message digest, digital signature, one-way encryption, digital fingerprint, or cryptographic hash. It is referred to as "one-way" because although you can calculate a message digest, given some data, you can't figure out what data produced a given message digest.

This is also a collision-free mechanism that guarantees that no two different values will produce the same digest. Another property of this digest is that it is a condensed representation of a message or a data file and as such it has a fixed length.

There are several message-digest algorithms used widely today.

 Algorithm

 Strength 

 MD5

 128 bit

 SHA-1

 160 bit

   
SHA-1 (Secure Hash Algorithm 1) is slower than MD5, but the message digest is larger, which makes it more resistant to brute force attacks. Therefore, it is recommended that Secure Hash Algorithm is preferred to MD5 for all of your digest needs. Note, SHA-1 now has even higher strength brothers, SHA-256, SHA-384, and SHA-512 for 256, 384 and 512-bit digests respectively.

Typical Registration Scenario

Here is a typical flow of how our message digest algorithm can be used to provide one-way password hashing:

1) User registers with some site by submitting the following data:

 

 username

 password

 jsmith

 mypass 

2) before storing the data, a one-way hash of the password is created: "mypass" is transformed into "5yfRRkrhJDbomacm2lsvEdg4GyY=" .

The data stored in the database ends up looking like this:

 username

 password

 jsmith

 5yfRRkrhJDbomacm2lsvEdg4GyY=

    
3) When jsmith comes back to this site later and decides to login using his credentials (jsmith/mypass), the password hash is created in memory (session) and is compared to the one stored in the database. Both values are equal to "5yfRRkrhJDbomacm2lsvEdg4GyY=" since the same password value "mypass" was used both times when submitting his credentials. Therefore, his login will be successful.

Note, any other plaintext password value will produce a different sequence of characters. Even using a similar password value ("mypast") with only one-letter difference, results in an entirely different hash: "hXdvNSKB5Ifd6fauhUAQZ4jA7o8=" .

 

 plaintext password

 encrypted password 

 mypass

 5yfRRkrhJDbomacm2lsvEdg4GyY= 

 mypast

 hXdvNSKB5Ifd6fauhUAQZ4jA7o8= 

   
As mentioned above, given that strong encryption algorithm such as SHA is used, it is impossible to reverse-engineer the encrypted value from "5yfRRkrhJDbomacm2lsvEdg4GyY=" to "mypass".

Therefore, even if a malicious hacker gets a hold of your password digest, he/she won't be able determine what your password is.