簡體   English   中英

使用“更多”文本抓取評論

[英]Scraping reviews with 'More' text

正如標題所述,我需要幫助從這個名為 TripAdivsor 的網站上抓取評論。 我使用的具體鏈接是https://www.tripadvisor.co.uk/Restaurant_Review-g60834-d4106745-Reviews-McDonald_s-Page_Arizona.html

問題是在某些評論中,有“更多”文本可以查看評論的其余部分(例如,上面鏈接上的第二個評論)。 如何抓取包含此“更多”文本的評論?

有沒有辦法在我點擊鏈接時已經打開它們,或者這是找到包含整個評論的正確標簽的問題?

使用硒和美麗的湯。如果點擊更多按鈕並獲取 page_source,請檢查更多按鈕。

from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
driver.get('https://www.tripadvisor.co.uk/Restaurant_Review-g60834-d4106745-Reviews-McDonald_s-Page_Arizona.html')
if len(driver.find_elements_by_xpath("//span[@class='taLnk ulBlueLinks'][contains(.,'More')]"))>0:
    driver.find_elements_by_xpath("//span[@class='taLnk ulBlueLinks'][contains(.,'More')]")[0].click()

time.sleep(3)
soup=BeautifulSoup(driver.page_source,'html.parser')
driver.quit()
items=[item.text for item in soup.select("p.partial_entry")]
print(items)

輸出:

['Stopped by to get some chicken strips to go.  They were out of soft drinks, but I was getting coffee.  Restrooms were clean.', "We live in page Arizona and go to McDonald's on the occasion that we don't want to cook but almost every time that we stop in the service is horrible. There has been times where the drive thru would not say anything to us until we decided to drive back around to really let them know we were ready to order food. The manager whom i have talked to on multiple occasions acts like it's bo big deal that their restaurant shows no respect for the customers. Finally i decided to write a review before calling corporate. I understand not wanting or liking your job at McDonald's but you made the life decisions to be where you are the least you could do is show some respect for your customers especially the locals of this tourist town.", 'The location was newer, clean and kept up very well. The hot fudge sundaes were great . Stopped by for a snack', 'We stopped in to grab a little snack before heading to Horseshoe Bend. My husband got a double cheeseburger, I ordered an apple pie. His burger was fine. The apples in the pie were all shriveled up. It looked old. I looked at the time on the box and it had expired 4 hours before. I walked back in and asked for a new one, explaining the one they just gave me was quite old. Then he handed me one and said try this one. I looked at the date and it expired 2 hours before. I asked if the had any fresh ones. He went into the back for awhile and came out with a new one.', 'I like the coffee, there was few times they messed up coffee 3x in a row. but its okay i had patience for them to get it right. I only like their fries, coffee, and a very few sandwiches. plus the nuggets. clean restrooms. clean tables but rude managers', 'Ordered mg nuggets and Big Mac for two and waited 25 minutes I decided to go ask for a refund or compensation but the manager did not want He said if I refund you ,you will not have your mealI find that not acceptable to wait that long and Big Macs were coldI am a big traveller and never saw a Manager like that Don’t go there Go to Taco Bell ...', "the employees were very fast and efficient at the service they provided whilst giving me my food. McDonald's is always reliable whenever you want a quick snack.", "It is a newer looking location with a huge amount of parking. The dining area was very large and quite clean. The service was very good. The food was just like any other McD's.", 'win i eat at the best restaurant the meals are the best i love the fries it gives me taste of joy . i like to eat their again i like to eat their win im on the road and i like to never stop eating its my great place to eat', "This is a new facility in what looks like a newer area of Page. Typical McDonald's but great service and new building makes this a good stop if you are looking for a quick fill up."]

目前您無法獲得評論的全文,因為它未包含在 html 中。

獲取方法如下:

  • 抓取頁面
  • 查找所有評論
  • 如果評論有“更多”鏈接:
  • 獲取身份證
  • 抓取“評論網址”

代碼:

import requests
from bs4 import BeautifulSoup as soup

website = "https://www.tripadvisor.co.uk/"
r_review_str = "Restaurant_Review-"
u_review_str = "ShowUserReviews-"
restaurant_id = "g60834-d4106745"
restaurant_name = "McDonald_s-Page_Arizona"

base_url = website + r_review_str + restaurant_id + " -Reviews-" + restaurant_name + ".html"
req = requests.get(base_url)
page = soup(req.text,'html.parser')

reviews_text =[]
reviews = page.find_all('div',{'class':'reviewSelector'})
for r in reviews:
    r_id = r.get('id').replace('review_','')
    p_text = r.find('p',{'class':'partial_entry'})
    text = ""
    if p_text.find('span',{'class':'ulBlueLinks'}):
        url = website + u_review_str + restaurant_id + "-r" + r_id + "-" + restaurant_name + ".html"
        req_u = requests.get(url)
        page_u = soup(req_u.text, "html.parser")
        text = page_u.find('div',{'id':'review_'+r_id}).find('p',{'class':'partial_entry'}).text
    else:
        text = p_text.text
    reviews_text.append(text)

from pprint import pprint
pprint(reviews_text)

輸出:

['Stopped by to get some chicken strips to go.  They were out of soft drinks, '
 'but I was getting coffee.  Restrooms were clean.',
 "We live in page Arizona and go to McDonald's on the occasion that we don't "
 'want to cook but almost every time that we stop in the service is horrible. '
 'There has been times where the drive thru would not say anything to us until '
 'we decided to drive back around to really let them know we were ready to '
 'order food. The manager whom i have talked to on multiple occasions acts '
 "like it's bo big deal that their restaurant shows no respect for the "
 'customers. Finally i decided to write a review before calling corporate. I '
 "understand not wanting or liking your job at McDonald's but you made the "
 'life decisions to be where you are the least you could do is show some '
 'respect for your customers especially the locals of this tourist town.',
 'The location was newer, clean and kept up very well. The hot fudge sundaes '
 'were great . Stopped by for a snack',
 'We stopped in to grab a little snack before heading to Horseshoe Bend. My '
 'husband got a double cheeseburger, I ordered an apple pie. His burger was '
 'fine. The apples in the pie were all shriveled up. It looked old. I looked '
 'at the time on the box and it had expired 4 hours before. I walked back in '
 'and asked for a new one, explaining the one they just gave me was quite old. '
 'Then he handed me one and said try this one. I looked at the date and it '
 'expired 2 hours before. I asked if the had any fresh ones. He went into the '
 'back for awhile and came out with a new one.',
 'I like the coffee, there was few times they messed up coffee 3x in a row. '
 'but its okay i had patience for them to get it right. I only like their '
 'fries, coffee, and a very few sandwiches. plus the nuggets. clean restrooms. '
 'clean tables but rude managers',
 'Ordered mg nuggets and Big Mac for two and waited 25 minutes I decided to go '
 'ask for a refund or compensation but the manager did not want He said if I '
 'refund you ,you will not have your mealI find that not acceptable to wait '
 'that long and Big Macs were coldI am a big traveller and never saw a Manager '
 'like that Don’t go there Go to Taco Bell ...',
 'the employees were very fast and efficient at the service they provided '
 "whilst giving me my food. McDonald's is always reliable whenever you want a "
 'quick snack.',
 'It is a newer looking location with a huge amount of parking. The dining '
 'area was very large and quite clean. The service was very good. The food was '
 "just like any other McD's.",
 'win i eat at the best restaurant the meals are the best i love the fries it '
 'gives me taste of joy . i like to eat their again i like to eat their win im '
 'on the road and i like to never stop eating its my great place to eat',
 'This is a new facility in what looks like a newer area of Page. Typical '
 "McDonald's but great service and new building makes this a good stop if you "
 'are looking for a quick fill up.']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM