[英]Python Beautiful Soup find_all
嗨,我正在嘗試從網站獲取一些信息。 如果我格式化任何錯誤,請原諒我這是我第一次發布到 SO。
soup.find('div', {"class":"stars"})
從這里我收到
<div class="stars" title="4.0 star rating">
<i class="star star--large star-0"></i><i class="star star--large star-
1"></i><i class="star star--large star-2"></i><i class="star star--large
star-3"></i><i class="star star--large star-4 star--large--muted"></i>
</div>
我需要那個"4.0 star rating"
當我使用:
soup.find('div', {"class":"stars"})["title"]
它有效,但不適用於 find_all。 但我試圖找到所有案例並將它們放入列表中。
這是我下面的完整代碼。
def get_info():
from IPython.display import HTML
import requests
from bs4 import BeautifulSoup
n = 1
for page in range(53):
url = f"https://www.sitejabber.com/reviews/apple.com?page=
{n}&sort=Reviews.processed&direction=DESC#reviews"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
all_reviews = soup.find_all('div', {'class':"truncate_review"})
all_dates = soup.find_all('div', {'class':'review__date'},'title')
all_titles = soup.find_all('span', {'class':'review__title__text'})
reviews_class = soup.find('div', {"class":"review__stars"})
for review in all_reviews:
all_reviews_list.append(review.text.replace("\n","").replace("\t",""))
for date in all_dates:
all_dates_list.append(date.text.replace("\n","").replace("\t",""))
for title in all_titles:
all_titles_list.append(title.text.replace("\n","").replace("\t",""))
for stars in reviews_class.find_all('div', {'class':'stars'}):
all_star_ratings.append(stars['title'])
n += 1
抱歉,我的縮進有點亂,但這是我的完整代碼。
像在字典中一樣遍歷 bs4 元素。
如果您使用的是find()
:
soup.find('div', {"class":"stars"}) ['title']
這有效,因為find()
返回單個值。
但是如果您使用的是find_all()
,它會返回一個列表,並且list[string]
是一個無效進程。
因此,您可以創建一個列表:
res = []
for i in soup.find_all('div', {"class":"stars"}):
res.append(i['title'])
否則,作為單行:
res = [i['title'] for i in soup.find_all('div', {"class":"stars"})]
既然要review的所有title,就需要指定review的容器,也就是scrape from:
<div class="review__container">
所以代碼將是:
review = soup.find_all('div',class_="review__container")
res = [i['title'] for j in review for i in j.find_all('div',class_='stars')]
給出:
['1.0 star rating', '1.0 star rating', '3.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '2.0 star rating', '5.0 star rating', '1.0 star rating', '2.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '5.0 star rating']
下面怎么樣
from bs4 import BeautifulSoup
html = """<div class="stars" title="4.0 star rating">
<i class="star star--large star-0"></i><i class="star star--large star-
1"></i><i class="star star--large star-2"></i><i class="star star--large
star-3"></i><i class="star star--large star-4 star--large--muted"></i>
</div>"""
soup = BeautifulSoup(html, features="lxml")
element = soup.select('.stars')[0]['title']
print(element)
印刷
4.0 star rating
使用 url
import requests
from bs4 import BeautifulSoup
url = 'https://www.sitejabber.com/reviews/apple.com?page={n}&sort=Reviews.processed&direction=DESC#reviews'
page = requests.get(url=url)
soup = BeautifulSoup(page.text, features="lxml")
elements = soup.select('.stars')
# print(elements)
for element in elements:
print(element['title'])
印刷
4.0 star rating
3.8 star rating
3.7 star rating
4.3 star rating
3.8 star rating
4.2 star rating
0.0 star rating
0.0 star rating
5.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
3.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
5.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
5.0 star rating
2.0 star rating
5.0 star rating
1.0 star rating
2.0 star rating
1.0 star rating
5.0 star rating
1.0 star rating
5.0 star rating
4.3 star rating
3.5 star rating
4.7 star rating
3.7 star rating
4.8 star rating
3.7 star rating
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.