I'm trying a code that will pull numbers from a URL using Beautiful Soup, then sum these numbers, but I keep getting an error that looks like this:
Expected string or buffer
I think it's related to the regular expressions, but I can't pinpoint the problem.
import re
import urllib
from BeautifulSoup import *
htm1 = urllib.urlopen('https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/comments_42.html').read()
soup = BeautifulSoup(htm1)
tags = soup('span')
for tag in tags:
y = re.findall ('([0-9]+)',tag.txt)
print sum(y)
I recommend bs4
instead of BeautifulSoup
(which is the old version). You also need to change this line:
y = re.findall ('([0-9]+)',tag)
to something like this:
y = re.findall ('([0-9]+)',tag.text)
See if this gets you further:
sum = 0 #initialize the sum
for tag in tags:
y = re.findall ('([0-9]+)',tag.text) #get the text from the tag
print(y[0]) #y is a list, print the first element of the list
sum += int(y[0]) #convert it to an integer and add it to the sum
print('the sum is: {}'.format(sum))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.