简体   繁体   中英

Web Scraping yelp review rating Python

rating=[] 

for i in range(0,10):
    
    url = "https://www.yelp.com/biz/snow-show-flushing?osq=ice%20cream%20shop&start="+str(10*i)

    ourUrl = urllib.request.urlopen(url)
    
    
    soup = BeautifulSoup(ourUrl,'html.parser')
    for r in soup.find_all('span',{'class':"display--inline__373c0__1gaV4 border-color--default__373c0__1yxBb"})[1:]:  
        per_rating = r.div.get('aria-label')
        rating.append(per_rating)

Try to get ratings for each page. Should have only 58 ratings in total, but it includes the rating from the "you might also consider".

How to fix it.

One possible solution would be to retrieve the total number of Reviews from yelp using BeautifulSoup. You can then trim your "rating"-list by the number of reviews.

# find the total number of reviews:
regex_count = re.compile('.*css-foyide.*')
Review_count = soup.find_all("p", {"class": regex_count})
Review_count = Review_count[0].text
Review_count = int(Review_count.split()[0]) # total number of reviews

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM