[英]Scraping Numbers from HTML using BeautifulSoup in python
Here is the code:这是代码:
import urllib.request
import re
from bs4 import BeautifulSoup
html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_1385959.html').read()
soup = BeautifulSoup(html, "html.parser")
sum=0
# Retrieve all of the anchor tags
tags = soup('<tr>')
for tag in tags:
# Look at the parts of a tag
y=str(tag)
x= re.findall("[0-9]+",y)
for i in x:
i=int(i)
sum=sum+i
print(sum)
why it counts wrongly?为什么计数错误? The number in the results is not what it should be.结果中的数字不是应有的数字。
You should replace soup('<tr>')
by soup('tr')
您应该将soup('<tr>')
替换为soup('tr')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.