简体   繁体   中英

Scraping <span> flow with BeautifulSoup

I am working on scraping the data from a website using BeautifulSoup. I cannot seem to find a way to get the text between span elements to print. Below is the structure.

<span class="greyText smallText">
                avg rating 4.02 —
                132,623 ratings  —
                published 2014
              </span>
<span class="greyText smallText">
                avg rating 4.03 —
                82,319 ratings  —
                published 2015
              </span>

I need to find avg ratings and ratings in separate.

import requests
from bs4 import BeautifulSoup as bs

url= "https://someurl"
page = requests.get(url) 
soup = bs(page.content, 'html.parser')
print(soup)
ratings = soup.find_all('span', attrs={'class': 'greyText smallText'})

Alternative solution: you can use re module to extract the average ratings:

import re
from bs4 import BeautifulSoup

txt = '''<span class="greyText smallText">
                avg rating 4.02 —
                132,623 ratings  —
                published 2014
              </span>
<span class="greyText smallText">
                avg rating 4.03 —
                82,319 ratings  —
                published 2015
              </span>'''

soup = BeautifulSoup(txt, 'html.parser')

for span in soup.select('span.greyText.smallText'):
    avg_rating = re.search(r'avg rating ([\d.]+)', span.text)
    if avg_rating:
        print(avg_rating[1])

Prints:

4.02
4.03
In [32]: [i.text.strip() for i in soup.find_all("span",class_="greyText smallText")]
Out[32]:
['avg rating 4.02 —\n                132,623 ratings  —\n                published 2014',
 'avg rating 4.03 —\n                82,319 ratings  —\n                published 2015']

Ratings as separate Value:

In [48]: [i.text.strip().split("\n")[0] for i in soup.find_all("span",class_="greyText smallText")]
Out[48]: ['avg rating 4.02 —', 'avg rating 4.03 —']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM