The XMLParser API

Fredrik Lundh | August 2007

The XMLParser class provides a fast and simple XML parser. The class implements the standard consumer interface for incoming data, and calls methods on a target object for start and end tags, and character data sections found in the XML stream.

The cElementTree implementation of this class is 25-30% faster than Python’s standard Expat interface, and nearly twice as fast as xml.sax.

Note that in ElementTree 1.2 and earlier, this class was called XMLTreeBuilder. The old name is still available, but should be avoided in new code. To write backwards compatible code, you can do:

try:
    XMLParser = ET.XMLParser
except AttributeError:
    XMLParser = ET.XMLTreeBuilder

where ET is an ElementTree implementation, and then refer to XMLParser when creating new parsers.

Example #

The following example defines a simple “echo” target, which simply prints each target method call. It then feeds it a single element, in two parts.

import xml.etree.ElementTree as ET # example

class EchoTarget:
    def start(self, tag, attrib):
        print "start", tag, attrib
    def end(self, tag):
        print "end", tag
    def data(self, data):
        print "data", repr(data)
    def close(self):
        print "close"

target = EchoTarget()
parser = ET.XMLParser(target=target)
parser.feed("<element>some ")
parser.feed("text</element>")
parser.close()

This prints:

start element None
data 'some '
data 'text'
end element
close

Note: cElementTree 1.0.5 and earlier doesn’t call the target’s close method when using custom targets. This has been fixed in 1.0.6.

The XMLParser Class #

parser = XMLParser()

parser = XMLParser(options)

Creates a parser instance. The following options can be used, given as keyword arguments:

target= Target object. If omitted, the parser uses an instance of the standard TreeBuilder class.

encoding= Optional encoding. If given, this value overrides the encoding specified in the XML file itself.

Implementations may support additional options. The result when using positional arguments instead of keyword arguments is undefined.

Attributes #

The parser attributes are not standardized. This section describes some common attributes. If present, they behave as documented in this section.

entity #

parser.entity (read-only but mutable dictionary)

A dictionary that contains replacement text for pre-defined named entities. The dictionary can be modified, but not replaced.

This attribute is supported by ElementTree and cElementTree.

To use this, create a parser and initialize this dictionary before you start parsing. To update the dictionary, use the update method, or set individual entries:

parser = ET.XMLParser()
parser.entity["nbsp"] = unichr(160)

target #

parser.target (read-only object)

The current target.

This attribute is supported by cElementTree and by ElementTree 1.3 and later.

version #

parser.version (read-only string)

Information about the underlying parser implementation. When present, this attribute should have the form “parser version“, e.g. “Expat 2.0.0”.

This attribute is supported by cElementTree and by ElementTree 1.3 and later.

Methods #

The basic consumer interface (feed, close) is supported by all implementations. Specific implementations may provide additional methods, for example to allow reuse of a configured parser.

feed #

parser.feed(data)

Feeds data to the parser. The argument should be an 8-bit string buffer containing encoded data.

The parser will parse as much of the XML stream as it can on each call, and call methods on the target object accordingly.

In 1.3 and later, this method raises a ParseError exception (a subclass of SyntaxError) if the source data is malformed.

In earlier versions, the exception used is implementation dependent; cElementTree uses a SyntaxError exception, other versions usually propagate the exception raised by the internal parser implementation (e.g. pyexpat.error for pyexpat-based parsers).

close #

parser.close() ⇒ element

Finishes feeding of data to this parser. This tells the parser to process any remaining data in the feed buffer, and then returns the value returned by the target’s close method (this is usually an element object).

In 1.3 and later, this method raises a ParseError exception if the source data is malformed. See feed for more on how exceptions are handled in earlier versions.

Note: In cElementTree 1.0.5 and earlier, this method doesn’t call the target’s close method when used with a custom target. To work around this, make sure that your target’s close method can be called twice, and use an explicit call to parser.target.close() to get data from the target. This has been fixed in cElementTree 1.0.6.

The Target Interface #

This section describes the full target interface supported by the parser. Note that not all methods are supported by all implementations; for example, the current cElementTree parser supports more methods than the Python implementations.

The start, end, data, and close methods are supported by all implementation.

start #

target.start(tag, attr_dict)

Called for start tags. The tag is given as a universal name, the attributes as a Python dictionary.

data #

target.data(text)

Called for character data and expanded character references and entities. May be called more than once for each character data section. The text string may be either an 8-bit string containing ASCII data, or a Unicode string.

end #

target.end(tag)

Called for end tags.

close #

target.close() ⇒ object

Called when the parser is done. The return value represents the built structure, and can be any kind object (including None). It is passed on by the parser’s close method.

xml #

target.xml(encoding, standalone)

Called when the parser sees the XML declaration. Not supported by current releases.

doctype #

target.doctype(name, public_identifier, system_identifier)

Called when the parser sees the doctype declaration. Supported by ElementTree 1.3 and later.

pi #

target.pi(target, data)

Called for processing instructions. Supported by cElementTree 1.0.3 and later. Not supported by ElementTree.

comment #

target.comment(text)

Called for comment blocks. Supported by cElementTree 1.0.3 and later. Not supported by ElementTree.

 

A Django site. rendered by a django application. hosted by webfaction.