使用 python 中的 BeautifulSoup 从 HTML 刮取数字

Question

Here is the code:这是代码：

import urllib.request
import re
from bs4 import BeautifulSoup


html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_1385959.html').read()
soup = BeautifulSoup(html, "html.parser")

sum=0
# Retrieve all of the anchor tags
tags = soup('<tr>')
for tag in tags:
    # Look at the parts of a tag
    y=str(tag)
    x= re.findall("[0-9]+",y)
    for i in x:
        i=int(i)
        sum=sum+i
print(sum)

why it counts wrongly?为什么计数错误？ The number in the results is not what it should be.结果中的数字不是应有的数字。

Answer 1

You should replace soup('<tr>') by soup('tr')您应该将soup('<tr>')替换为soup('tr')

使用 python 中的 BeautifulSoup 从 HTML 刮取数字

问题描述

1 个解决方案

解决方案1
0 2022-01-07 21:33:59

使用 python 中的 BeautifulSoup 从 HTML 刮取数字

问题描述

1 个解决方案

解决方案1 0 2022-01-07 21:33:59

解决方案1
0 2022-01-07 21:33:59