This is an old copy of the Python FAQ. The information here may be outdated.

How do I get data out of HTML?

Try Beautiful Soup:

Beautiful Soup is more forgiving than other parsers in that it won’t choke on bad markup.

If you want to parse HTML into a structure compatible with Python’s ElementTree library, you can use the ElementSoup adapter:



