['term', 'extraction']

Fredrik Lundh | November 2005 | Originally posted to online.effbot.org


Erik Stattin linked to this page (dead link) which led me to this page (dead link) which reminded me of this which inspired me to whip up this little script:

# File: YahooTermExtraction.py
# An interface to Yahoo's Term Extraction service:
# http://developer.yahoo.net/search/content/V1/termExtraction.html
# "The Term Extraction Web Service provides a list of significant
# words or phrases extracted from a larger content."

import urllib
    from xml.etree import ElementTree # 2.5 and later
except ImportError:
    from elementtree import ElementTree

URI = "http://api.search.yahoo.com"
URI = URI + "/ContentAnalysisService/V1/termExtraction"

def termExtraction(appid, context, query=None):
    d = dict(
    if query:
        d["query"] = query.encode("utf-8")
    result = []
    f = urllib.urlopen(URI, urllib.urlencode(d))
    for event, elem in ElementTree.iterparse(f):
        if elem.tag == "{urn:yahoo:cate}Result":
    return result


>>> from YahooTermExtraction import termExtraction
>>> appid = "/your app id/"
>>> uri = "/some uri/"
>>> text = urllib.urlopen(uri).read()
>>> termExtraction(appid, text)[-5:]
['horrible picture', 'logo', 'spammer', 'moron', 'cat mouse']

(For best results, you should probably run the text through a HTML-to-text conversion before you send it to Yahoo. Some variation of this script might be useful.)


A Django site. rendered by a django application. hosted by webfaction.