[英]BeautifulSoup: Extract “img alt” content Web Scraping in Python
[英]Scraping an alt tag using Python and BeautifulSoup
我是Python的新手,也是BeautifulSoup的新手,而我正嘗試刮擦評論者在Yelp上離開餐廳的星級。
到目前為止,我有以下代碼:
import requests
from bs4 import BeautifulSoup as soup
url = "https://www.yelp.com/biz/monkey-house-cafe-huntington-beach"
r = requests.get(url)
page_soup = soup(r.content, "lxml")
review_container = page_soup.findAll("div", {"class": "review-content"})
review_container[0]
當我在Jupyter Notebook中運行該代碼時,將得到以下內容,它與最新的評論相對應:
<div class="review-content">
<div class="biz-rating biz-rating-large clearfix">
<div>
<div class="i-stars i-stars--regular-5 rating-large" title="5.0 star rating">
<img alt="5.0 star rating" class="offscreen" height="303" src="https://s3-media1.fl.yelpcdn.com/assets/srv0/yelp_design_web/41341496d9db/assets/img/stars/stars.png" width="84"/>
</div>
</div>
<span class="rating-qualifier">
5/10/2017
</span>
</div>
<p lang="en">This place is really fun and cute. I was happy to discover it.. <br/><br/>They also have beer and wine here, which is kind of a nice bonus. The sangria is good..</p>
</div>
我的問題是如何從每次評論中獲得星級數?
我認為最好是將img alt
標簽的內容刮掉,但是我不確定如何做到這一點。
如果要從img alt
提取,則可以使用:
review_container[0].select('img')[0]['alt'].split()[0]
'5.0'
float(review_container[0].find("img")["alt"][:3])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.