We're back after a server migration that caused effbot.org to fall over a bit harder than expected. Expect some glitches.

The ElementRXP module

Fredrik Lundh | February 2005 | Originally posted to online.effbot.org

Here’s a simple module that uses the PyRXP parser to build an element tree:

# File: ElementRXP.py

    from cElementTree import Element
except ImportError:
    from elementtree.ElementTree import Element

    from pyRXPU import Parser
except ImportError:
    # fall back on ASCII-only parser
    from pyRXP import Parser

def fixelement((tag, attrib, children, spare)):
    elem = this = Element(tag, attrib)
    for child in children:
        if isinstance(child, tuple):
            this = fixelement(child)
            # add text fragments to the right place
            if this is elem:
                this.text = child
                this.tail = child
    return elem

def parse(file):
    if not hasattr(file, "read"):
        file = open(file)
    p = Parser(ExpandEmpty=1)
    return fixelement(p.parse(file.read()))

This is a faster than the Python version of ElementTree, but a lot slower than plain cElementTree. However, the PyRXP(U) library supports DTD validation, which can come in handy in some applications.