简体   繁体   中英

"Expected string or buffer" error using Beautiful Soup

I'm trying a code that will pull numbers from a URL using Beautiful Soup, then sum these numbers, but I keep getting an error that looks like this:

Expected string or buffer

I think it's related to the regular expressions, but I can't pinpoint the problem.

import re
import urllib

from BeautifulSoup import *
htm1 = urllib.urlopen('https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/comments_42.html').read()
soup = BeautifulSoup(htm1)
tags = soup('span')

for tag in tags:
    y = re.findall ('([0-9]+)',tag.txt)

print sum(y)

I recommend bs4 instead of BeautifulSoup (which is the old version). You also need to change this line:

y = re.findall ('([0-9]+)',tag)

to something like this:

y = re.findall ('([0-9]+)',tag.text)

See if this gets you further:

sum = 0  #initialize the sum
for tag in tags:
    y = re.findall ('([0-9]+)',tag.text)  #get the text from the tag                                                                                                                                    
    print(y[0])  #y is a list, print the first element of the list                                                                                                                                      
    sum += int(y[0])  #convert it to an integer and add it to the sum                                                                                                                                   

print('the sum is: {}'.format(sum))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM