簡體   English   中英

Python 美湯 find_all

[英]Python Beautiful Soup find_all

嗨,我正在嘗試從網站獲取一些信息。 如果我格式化任何錯誤,請原諒我這是我第一次發布到 SO。

soup.find('div', {"class":"stars"}) 

從這里我收到

<div class="stars" title="4.0 star rating">
<i class="star star--large star-0"></i><i class="star star--large star- 
1"></i><i class="star star--large star-2"></i><i class="star star--large 
star-3"></i><i class="star star--large star-4 star--large--muted"></i> 
</div>

我需要那個"4.0 star rating"

當我使用:

soup.find('div', {"class":"stars"})["title"]

它有效,但不適用於 find_all。 但我試圖找到所有案例並將它們放入列表中。

這是我下面的完整代碼。

    def get_info():
        from IPython.display import HTML
        import requests
        from bs4 import BeautifulSoup
        n = 1
        for page in range(53):
            url = f"https://www.sitejabber.com/reviews/apple.com?page= 
   {n}&sort=Reviews.processed&direction=DESC#reviews"
            r = requests.get(url)
            soup = BeautifulSoup(r.text, 'lxml')
            all_reviews = soup.find_all('div', {'class':"truncate_review"})
            all_dates = soup.find_all('div', {'class':'review__date'},'title')
            all_titles = soup.find_all('span', {'class':'review__title__text'})
            reviews_class = soup.find('div', {"class":"review__stars"})
            for review in all_reviews:

    all_reviews_list.append(review.text.replace("\n","").replace("\t",""))
            for date in all_dates:

all_dates_list.append(date.text.replace("\n","").replace("\t",""))
            for title in all_titles:

  all_titles_list.append(title.text.replace("\n","").replace("\t",""))
            for stars in reviews_class.find_all('div', {'class':'stars'}):
                all_star_ratings.append(stars['title'])



            n += 1

抱歉,我的縮進有點亂,但這是我的完整代碼。

像在字典中一樣遍歷 bs4 元素。
如果您使用的是find()

soup.find('div', {"class":"stars"}) ['title']

這有效,因為find()返回單個值。
但是如果您使用的是find_all() ,它會返回一個列表,並且list[string]是一個無效進程。
因此,您可以創建一個列表:

res = []
for i in soup.find_all('div', {"class":"stars"}):
    res.append(i['title'])

否則,作為單行:

res = [i['title'] for i in soup.find_all('div', {"class":"stars"})]

既然要review的所有title,就需要指定review的容器,也就是scrape from:

<div class="review__container">

所以代碼將是:

review = soup.find_all('div',class_="review__container")
res = [i['title'] for j in review for i in j.find_all('div',class_='stars')]

給出:

['1.0 star rating', '1.0 star rating', '3.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '2.0 star rating', '5.0 star rating', '1.0 star rating', '2.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '5.0 star rating']

下面怎么樣

from bs4 import BeautifulSoup

html = """<div class="stars" title="4.0 star rating">
<i class="star star--large star-0"></i><i class="star star--large star- 
1"></i><i class="star star--large star-2"></i><i class="star star--large 
star-3"></i><i class="star star--large star-4 star--large--muted"></i> 
</div>"""

soup = BeautifulSoup(html, features="lxml")
element = soup.select('.stars')[0]['title']
print(element)

印刷

4.0 star rating

使用 url

import requests
from bs4 import BeautifulSoup

url = 'https://www.sitejabber.com/reviews/apple.com?page={n}&sort=Reviews.processed&direction=DESC#reviews'
page = requests.get(url=url)

soup = BeautifulSoup(page.text, features="lxml")

elements = soup.select('.stars')
# print(elements)

for element in elements:
    print(element['title'])

印刷

4.0 star rating
3.8 star rating
3.7 star rating
4.3 star rating
3.8 star rating
4.2 star rating
0.0 star rating
0.0 star rating
5.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
3.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
5.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
1.0 star rating
5.0 star rating
2.0 star rating
5.0 star rating
1.0 star rating
2.0 star rating
1.0 star rating
5.0 star rating
1.0 star rating
5.0 star rating
4.3 star rating
3.5 star rating
4.7 star rating
3.7 star rating
4.8 star rating
3.7 star rating

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM