使用Beautiful Soup在python html解析中使用xml数据的理想方法是什么？

Question

使用Beautiful Soup在python html解析中将xml转换为文本的理想方法是什么？

当我使用Python 2.7 BeautifulSoup库进行html解析时，可以进入“汤”步骤，但是我不知道如何提取所需的数据，因此我尝试将它们全部转换为字符串。

在下面的示例中，我想提取span标记中的所有数字并将它们加起来。 有没有更好的办法？

XML数据： http ： //python-data.dr-chuck.net/comments_324255.html

码：

import urllib2
from BeautifulSoup import *
import re

url = 'http://python-data.dr-chuck.net/comments_324255.html'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
spans = soup('span')
lis = list()
span_str = str(spans)
sp = re.findall('([0-9]+)', span_str)
count = 0
for i in sp:
    count = count + int(i)
print('Sum:', count)

Answer 1

不需要正则表达式：

from bs4 import BeautifulSoup
from requests import get

url = 'http://python-data.dr-chuck.net/comments_324255.html'
html = get(url).text
soup = BeautifulSoup(html, 'lxml')

count = sum(int(n.text) for n in soup.findAll('span'))

Answer 2

import requests, bs4
r = requests.get("http://python-data.dr-chuck.net/comments_324255.html")
soup = bs4.BeautifulSoup(r.text, 'lxml')

sum(int(span.text) for span in soup.find_all(class_="comments"))

输出：

使用Beautiful Soup在python html解析中使用xml数据的理想方法是什么？

问题描述

2 个解决方案

解决方案1
1 2017-01-19 14:15:48

解决方案2
0 2017-01-19 14:16:21

使用Beautiful Soup在python html解析中使用xml数据的理想方法是什么？

问题描述

2 个解决方案

解决方案1 1 2017-01-19 14:15:48

解决方案2 0 2017-01-19 14:16:21

解决方案1
1 2017-01-19 14:15:48

解决方案2
0 2017-01-19 14:16:21