简体   繁体   中英

Encoding Error, Beautiful soup in Python

I have a HTML that is being read by BeautifulSoup and it finds a certain label in there.

        availabilityList = []
        for label in soup.find(id=studyroom).select('li.zone label'):
            a = label.get_text()
            b = a.encode('ascii','ignore')
            availabilityList.extend(b)
        #this part below doesn't work
        ','.join(availabilityList)

I used the encode to remove the u at the beginning of the list but this i still get a weird error.

The print availabilityList is

['R', 'o', 'o', 'm', ' ', '2', '2', '5', ' ', '1', '0', ':', '0', '0', ' ', 'A', 'M', 'R', 'o', 'o', 'm', ' ', '2', '2', .....]

I just need a list with a strings. The join function doesn't work

availabilityList = [Room 225 10:00 AM, Room 225 11:00 AM...]
availabilityList.extend(b)

is going to treat b as a list. In this case a list of characters and extend availabilityList with it.

You need to be doing:

availabilityList.append(b)

This is what I mean:

>>> a_list = []
>>> a = 'text'
>>> a_list.append(a)
>>> a_list
['text']
>>> b = 'new_text'
>>> a_list.extend(b)
>>> a_list
['text', 'n', 'e', 'w', '_', 't', 'e', 'x', 't']

Note the difference append and extend made.

I do not think the error is with BeautifulSoup but rather with your use of the extend function instead of the append function.

The correct line 4 of the above code would be : availabilityList.append(b)

Basically what happens is that the string in "b" gets treated like a list of characters and each character gets appended at the end of "availabilityList". Take a look here to see the difference between extend and append.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM