The md5 module

Note: In Python 2.5, this module is a compatibility wrapper for hashlib.

This module is used to calculate message signatures (so-called “message digests”).

The MD5 algorithm calculates a strong 128-bit signature. This means that if two strings are different, it’s highly likely that their MD5 signatures are different as well. Or to put it another way, given an MD5 digest, it’s supposed to be nearly impossible to come up with a string that generates that digest.

Note: Since this was written, MD5 has been broken. It’s now relatively easy to generate files that differ slightly, but have the same MD5 signature, if you can insert random-looking data somewhere in the file (e.g in a comment, or in a part of the file that’s not used for any other purpose). Most published attacks use environments where such differences can be used to control the result in some way (e.g. a PostScript document that contains two texts, and code that selects which one to display, or a self-extracting executable that extracts different files). While applications that use MD5 to sign only the data that’s actually displayed or extracted should be safe for now, use of MD5 in new applications should be avoided.

Example: Using the md5 module
# File: md5-example-1.py

import md5

hash = md5.new()
hash.update("spam, spam, and eggs")

print repr(hash.digest())
'L\005J\243\266\355\243u`\305r\203\267\020F\303'

Note that the checksum is returned as a binary string. Getting a hexadecimal or base64-encoded string is quite easy, though:

 
Example: Using the md5 module to get a hexadecimal or base64-encoded md5 value
# File: md5-example-2.py

import md5
import string
import base64

hash = md5.new()
hash.update("spam, spam, and eggs")

value = hash.digest()

print hash.hexdigest()

# in Python 1.5.2 and earlier, use this instead:
# print string.join(map(lambda v: "%02x" % ord(v), value), "")

print base64.encodestring(value)
4c054aa3b6eda37560c57283b71046c3
TAVKo7bto3VgxXKDtxBGww==

Among other things, the MD5 checksum can be used for challenge-response authentication (but see the note on random numbers below):

 
Example: Using the md5 module for challenge-response authentication
# File: md5-example-3.py

import md5
import string, random

def getchallenge():
    # generate a 16-byte long random string.  (note that the built-
    # in pseudo-random generator uses a 24-bit seed, so this is not
    # as good as it may seem...)
    challenge = map(lambda i: chr(random.randint(0, 255)), range(16))
    return string.join(challenge, "")

def getresponse(password, challenge):
    # calculate combined digest for password and challenge
    m = md5.new()
    m.update(password)
    m.update(challenge)
    return m.digest()

#
# server/client communication

# 1. client connects.  server issues challenge.

print "client:", "connect"

challenge = getchallenge()

print "server:", repr(challenge)

# 2. client combines password and challenge, and calculates
# the response

client_response = getresponse("trustno1", challenge)

print "client:", repr(client_response)

# 3. server does the same, and compares the result with the
# client response.  the result is a safe login in which the
# password is never sent across the communication channel.

server_response = getresponse("trustno1", challenge)

if server_response == client_response:
    print "server:", "login ok"
client: connect
server: '\334\352\227Z#\272\273\212KG\330\265\032>\311o'
client: "l'\305\240-x\245\237\035\225A\254\233\337\225\001"
server: login ok

A variation of this can be used to sign messages sent over a public network, so that their integrity can be verified at the receiving end.

Example: Using the md5 module for data integrity checks
# File: md5-example-4.py

import md5
import array

class HMAC_MD5:
    # keyed MD5 message authentication

    def __init__(self, key):
        if len(key) > 64:
            key = md5.new(key).digest()
        ipad = array.array("B", [0x36] * 64)
        opad = array.array("B", [0x5C] * 64)
        for i in range(len(key)):
            ipad[i] = ipad[i] ^ ord(key[i])
            opad[i] = opad[i] ^ ord(key[i])
        self.ipad = md5.md5(ipad.tostring())
        self.opad = md5.md5(opad.tostring())

    def digest(self, data):
        ipad = self.ipad.copy()
        opad = self.opad.copy()
        ipad.update(data)
        opad.update(ipad.digest())
        return opad.digest()

#
# simulate server end

key = "this should be a well-kept secret"
message = open("samples/sample.txt").read()

signature = HMAC_MD5(key).digest(message)

# (send message and signature across a public network)

#
# simulate client end

key = "this should be a well-kept secret"

client_signature = HMAC_MD5(key).digest(message)

if client_signature == signature:
    print "this is the original message:"
    print
    print message
else:
    print "someone has modified the message!!!"

The copy method takes a snapshot of the internal object state. This allows you to precalculate partial digests (such as the padded key, in this example).

For details on this algorithm, see HMAC-MD5: Keyed-MD5 for Message Authentication by Krawczyk et al.

Warning: Don’t forget that the built-in psuedo random number generator isn’t really good enough for encryption purposes. Be careful.

 

A Django site. rendered by a django application. hosted by webfaction.