简体   繁体   中英

Python 3 Special characters escaping

import urllib
from urllib.request import urlopen


address='http://www.iitb.ac.in/acadpublic/RunningCourses.jsp?deptcd=EE&year=2012&semester=1'
source= urlopen(address).read()
source=str(source)


from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
        def handle_data(self, data):
            x=str(data)
            if x != ('\r\n\t\t\t\t') or ('\r\n\t\t\t\t\t') or ('\r\n\r\n\t\t\t'):
                print("Encountered some data:",x)

parser = MyHTMLParser(strict=False)
parser.feed(source)

The above code isn't working. It is still printing '\\r\\n\\t\\t\\t\\t' stuff. Any suggestions?

if x != ('\r\n\t\t\t\t') or ('\r\n\t\t\t\t\t') or ('\r\n\r\n\t\t\t')

should be

if x not in ('\r\n\t\t\t\t', '\r\n\t\t\t\t\t', '\r\n\r\n\t\t\t')

or better:

if not x.isspace()

Your first code is evaluated as:

if (x != ('\r\n\t\t\t\t')) or '\r\n\t\t\t\t\t' or '\r\n\r\n\t\t\t'

Notice the last values are evaluated as themselves! Only an empty string will evaluate False thus this condition will always pass

may be the number of \\t and \\r etc are varying try this :

if x.replace('\r','').replace('\n','').replace('\t','').strip():
    print("Encountered some data:",x)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM