import urllib
from urllib.request import urlopen
address='http://www.iitb.ac.in/acadpublic/RunningCourses.jsp?deptcd=EE&year=2012&semester=1'
source= urlopen(address).read()
source=str(source)
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_data(self, data):
x=str(data)
if x != ('\r\n\t\t\t\t') or ('\r\n\t\t\t\t\t') or ('\r\n\r\n\t\t\t'):
print("Encountered some data:",x)
parser = MyHTMLParser(strict=False)
parser.feed(source)
The above code isn't working. It is still printing '\\r\\n\\t\\t\\t\\t' stuff. Any suggestions?
if x != ('\r\n\t\t\t\t') or ('\r\n\t\t\t\t\t') or ('\r\n\r\n\t\t\t')
should be
if x not in ('\r\n\t\t\t\t', '\r\n\t\t\t\t\t', '\r\n\r\n\t\t\t')
or better:
if not x.isspace()
Your first code is evaluated as:
if (x != ('\r\n\t\t\t\t')) or '\r\n\t\t\t\t\t' or '\r\n\r\n\t\t\t'
Notice the last values are evaluated as themselves! Only an empty string will evaluate False
thus this condition will always pass
may be the number of \\t and \\r etc are varying try this :
if x.replace('\r','').replace('\n','').replace('\t','').strip():
print("Encountered some data:",x)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.