['term', 'extraction']
Fredrik Lundh | November 2005 | Originally posted to online.effbot.org
Erik Stattin linked to this page which led me to this page which reminded me of this which inspired me to whip up this little script:
# File: YahooTermExtraction.py # # An interface to Yahoo's Term Extraction service: # # http://developer.yahoo.net/search/content/V1/termExtraction.html # # "The Term Extraction Web Service provides a list of significant # words or phrases extracted from a larger content." # import urllib from elementtree import ElementTree URI = "http://api.search.yahoo.com" URI = URI + "/ContentAnalysisService/V1/termExtraction" def termExtraction(appid, context, query=None): d = dict( appid=appid, context=context.encode("utf-8") ) if query: d["query"] = query.encode("utf-8") result = [] f = urllib.urlopen(URI, urllib.urlencode(d)) for event, elem in ElementTree.iterparse(f): if elem.tag == "{urn:yahoo:cate}Result": result.append(elem.text) return result
Usage:
>>> from YahooTermExtraction import termExtraction >>> appid = "/your app id/" >>> uri = "/some uri/" >>> text = urllib.urlopen(uri).read() >>> termExtraction(appid, text)[-5:] ['horrible picture', 'logo', 'spammer', 'moron', 'cat mouse']
(For best results, you should probably run the text through a HTML-to-text conversion before you send it to Yahoo. Some variation of this script might be useful.)
