How to parse data from HTML

Looking for a solution to extract data from HTML :scratch:

Thanks

Rob

You can use Python’s built-in HTMLParser module, but if you’re not handy with programming it probably won’t be an easy task. See here for documentation on how it works.

The following is a quick example that you can put into the actionPerformed of a button to see it in action…

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        print "Encountered the beginning of a %s tag" % tag

    def handle_endtag(self, tag):
        print "Encountered the end of a %s tag" % tag
        


parser = MyHTMLParser()

parser.feed("<html><body><div>hi</div></body></html>")

The following will be printed to console:

Encountered the beginning of a html tag
Encountered the beginning of a body tag
Encountered the beginning of a div tag
Encountered the end of a div tag
Encountered the end of a body tag
Encountered the end of a html tag