What is the ideal way to use xml data in python html parsing with Beautiful Soup?

Question

What is the ideal way to convert xml to text in python html parsing with Beautiful Soup?

When I am doing html parsing with Python 2.7 BeautifulSoup library, I can get to the step to "soup", but I have no idea how to extract the data I need, so I tried converting them all to string.

In the following example, I want to extract all number in the span tag and add them up. Is there a better way?

XML data: http://python-data.dr-chuck.net/comments_324255.html

CODE:

import urllib2
from BeautifulSoup import *
import re

url = 'http://python-data.dr-chuck.net/comments_324255.html'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
spans = soup('span')
lis = list()
span_str = str(spans)
sp = re.findall('([0-9]+)', span_str)
count = 0
for i in sp:
    count = count + int(i)
print('Sum:', count)

Answer 1

Don't need regex:

from bs4 import BeautifulSoup
from requests import get

url = 'http://python-data.dr-chuck.net/comments_324255.html'
html = get(url).text
soup = BeautifulSoup(html, 'lxml')

count = sum(int(n.text) for n in soup.findAll('span'))

Answer 2

import requests, bs4
r = requests.get("http://python-data.dr-chuck.net/comments_324255.html")
soup = bs4.BeautifulSoup(r.text, 'lxml')

sum(int(span.text) for span in soup.find_all(class_="comments"))

output:

What is the ideal way to use xml data in python html parsing with Beautiful Soup?

Question

2 answers

solution1
1 2017-01-19 14:15:48

solution2
0 2017-01-19 14:16:21

What is the ideal way to use xml data in python html parsing with Beautiful Soup?

Question

2 answers

solution1 1 2017-01-19 14:15:48

solution2 0 2017-01-19 14:16:21

solution1
1 2017-01-19 14:15:48

solution2
0 2017-01-19 14:16:21