简体   繁体   中英

malformed start tag, exception being thrown in python 2.6.9 but not in 2.7.4 HTMLParser

I am fetching url contents using urllib2 in python and them subjecting it to python's native html parser. Code works wonderfully well on my python 2.7.4, however, my friend's machine has python 2.6.9 and the issue being faced on his machine is:

Traceback (most recent call last):
File "opsview_audit.py", line 420, in <module>
check_instances_against_regex(instances)
File "opsview_audit.py", line 219, in check_instances_against_regex
attrs_being_monitored = get_host_monitoring_status(cred['url'], running_instances, 
cred['user_name'], cred['pass_key'])
File "opsview_audit.py", line 112, in get_host_monitoring_status
parser.feed(result.read())
File "/usr/lib64/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib64/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib64/python2.6/HTMLParser.py", line 229, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib64/python2.6/HTMLParser.py", line 304, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib64/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 509, column 47

May be some start tag wasn't proper, which in python 2.6.9 is being thrown as an exception, but not in 2.7.4
Here, upgrading 2.6.9 to 2.7.4 or above is not an option.

Two solutions:

-Use another htmlparser like Beautiful soup 3 or lxml. They are both really easy to learn and campatible with python 2.6.

-Try to find the bug and filter it out.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM