Common Log Format

March 2004 | Fredrik Lundh

Here’s a simple regular expression that can be used to parse server log files, in the Common Log Format.

 
p = re.compile(
    '([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*)'
    )

for line in file.readlines():
    m = p.match(line)
    if not m:
        continue
    host, ignore, user, date, request, status, size = m.groups()
    ...

Here’s a variation that parses the Extended Common Log Format, which contains additional referrer and user-agent fields.

 
p = re.compile(
    '([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*)'
    ' "([^"]*)" "([^"]*)"' # extensions
    )


for line in file.readlines():
    m = p.match(line)
    if not m:
        continue
    host, ignore, user, date, request, status, size,
        referer, agent = m.groups()
    ...

Comment:

I'm using the regular expression from http://effbot.org/zone/re-common-log-format.htm. I have a problem with it, though: I have some lines in my Apache log which contain something of the form "GET /\" HTTP/1.1". So they have a " which is kind of 'escaped' by a \. I was trying to change the regex to take this into account, but I've come up dry... Here are some things I tried:
' "((?:(?:[^"])|(?:\\"))*)" ' (matching w/ non-matching groups)
' "([^"(?:\\")]*)" ' (non-matching group inside character class)
' "([^"]*)(?<!\\\)" ' (negative lookbehind assertion)
Some friends proposed this:
' "([^"]*(?<!\\\\)(\\\\(\\\\\\\\)*"[^"]*)*)" '
But it's both monstrous and fails in the case of more \"\" things... I was wondering if you have any suggestions? (And kind of reporting a 'bug', I guess.)

Posted by Manuzhai (2006-06-20)

Comment:

Might be interesting to look at Perl's Apache::LogRegex module (in CPAN) - has a pretty nice approach whereby you pass it your Apache custom log format and it generates a regex for parsing lines from it, returning a dict with the format identifiers as keys. Not sure if it's capable of handling everything but one or two bugs seem to have been slayed in 5 releases.

Posted by Harry Fuecks (2006-12-05)

Comment:

Now implemented: http://webtuesday.ch/~harryf/code/apachelog/ - either just download or $ bzr get http://webtuesday.ch/~harryf/code/apachelog/

Posted by Harry Fuecks (2006-12-08)

A Django site. rendered by a django application. hosted by webfaction.