ElementTree: Bits and Pieces

Code samples that don’t fit anywhere else (yet).

Getting all text from inside an element #

The text attribute contains the text immediately inside an element, but it does not include text inside subelements. To get all text, you can use something like:

def gettext(elem):
    text = elem.text or ""
    for e in elem:
        text += gettext(e)
        if e.tail:
            text += e.tail
    return text

Removing elements #

To remove an element from a tree, you have to replace the element with its contents. This includes not only the subelements, but also the text and tail attributes.

The following function takes a tree and a filter function, and removes all subelements for which the filter returns false.

def cleanup(elem, filter):
    out = []
    for e in elem:
        cleanup(e, filter)
        if not filter(e):
            if e.text:
                if out:
                    out[-1].tail += e.text
                    elem.text += e.text
            if e.tail:
                if out:
                    out[-1].tail += e.tail
                    elem.text += e.tail
    elem[:] = out

Note that the top element itself isn’t checked; if you need to remove that, you have to do that at the application level.

Instead of writing a filter function, you can iterate over the tree and set the tag to None for the elements you want to remove. When you’ve checked all elements, call the cleanup function as follows:

cleanup(elem, lambda e: e.tag)

In ElementTree 1.3, the serialization code will leave out the tags for elements that have their tag attribute set to None.


A Django site. rendered by a django application. hosted by webfaction.