简体   繁体   English

Selenium Python 在获取谷歌评论时无法向下滚动

[英]Selenium Python Unable to scroll down, while fetching google reviews

I am trying to fetch google reviews with the help of selenium in python.我试图在 python 中硒的帮助下获取谷歌评论。 I have imported webdriver from selenium python module.我已经从 selenium python 模块导入了 webdriver。 Then I have initialized self.driver as follows:-然后我初始化了 self.driver 如下:-

self.driver = webdriver.Chrome(executable_path="./chromedriver.exe",chrome_options=webdriver.ChromeOptions())

After this I am using the following code to type the company name on google homepage whose reviews I need, for now I am trying to fetch reviews for "STANLEY BRIDGE CYCLES AND SPORTS LIMITED ":-在此之后,我使用以下代码在谷歌主页上输入我需要其评论的公司名称,现在我正在尝试获取对“STANLEY BRIDGE CYCLES AND SPORTS LIMITED”的评论:-

company_name = self.driver.find_element_by_name("q")
company_name.send_keys("STANLEY BRIDGE CYCLES AND SPORTS LIMITED ")
time.sleep(2)

After this to click on the google search button, using the following code:-在此之后点击谷歌搜索按钮,使用以下代码:-

self.driver.find_element_by_name("btnK").click()
time.sleep(2)

Then finally I am on the page where I can see results.最后,我进入了可以查看结果的页面。 Now I want to click on the View on google reviews button.现在我想点击查看谷歌评论按钮。 For that using the following code:-为此,使用以下代码:-

self.driver.find_elements_by_link_text("View all Google reviews")[0].click()
time.sleep(2)

Now I am able to get reviews, but only 10. I need at least 20 reviews for a company.现在我可以获得评论,但只有 10 个。我需要至少 20 个公司评论。 For that I am trying to scroll the page down using the following code: self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(5)为此,我尝试使用以下代码向下滚动页面: self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(5)

Even while using the above code to scroll the down the page, I am still getting only 10 reviews.即使使用上述代码向下滚动页面,我仍然只收到 10 条评论。 I am not getting any error though.我没有收到任何错误。

Need help on how to scroll down the page to get atleast 20 reviews.需要有关如何向下滚动页面以获得至少 20 条评论的帮助。 As of now I am able to get only 10 reviews.截至目前,我只能获得 10 条评论。 Based on my online search for this issue, people have mostly used: "driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")" to scroll the page down whenever required.根据我对这个问题的在线搜索,人们大多使用:“driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")” 在需要时向下滚动页面。 But for me this is not working.但对我来说这是行不通的。 I checked the the height of the page before and after ("driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")") is the same.我检查了前后页面的高度("driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")") 是一样的。

Use Javascript to scroll to the last review, this will trigger additional review load.使用 Javascript 滚动到最后一条评论,这将触发额外的评论负载。

last_review = self.driver.find_element_by_css_selector('div.gws-localreviews__google-review:last-of-type')
self.driver.execute_script('arguments[0].scrollIntoView(true);', last_review)

EDIT:编辑:

The following example is working correctly for me on Firefox and Chrome, you can reuse the extract google reviews function for your needs以下示例在 Firefox 和 Chrome 上对我来说正常工作,您可以根据需要重复使用提取谷歌评论功能

import time

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait


def extract_google_reviews(driver, query):
    driver.get('https://www.google.com/?hl=en')
    driver.find_element_by_name('q').send_keys(query)
    WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.NAME, 'btnK'))).click()

    reviews_header = driver.find_element_by_css_selector('div.kp-header')
    reviews_link = reviews_header.find_element_by_partial_link_text('Google reviews')
    number_of_reviews = int(reviews_link.text.split()[0])
    reviews_link.click()

    all_reviews = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
    while len(all_reviews) < number_of_reviews:
        driver.execute_script('arguments[0].scrollIntoView(true);', all_reviews[-1])
        WebDriverWait(driver, 5, 0.25).until_not(EC.presence_of_element_located((By.CSS_SELECTOR, 'div[class$="activityIndicator"]')))
        all_reviews = driver.find_elements_by_css_selector('div.gws-localreviews__google-review')

    reviews = []
    for review in all_reviews:
        try:
            full_text_element = review.find_element_by_css_selector('span.review-full-text')
        except NoSuchElementException:
            full_text_element = review.find_element_by_css_selector('span[class^="r-"]')
        reviews.append(full_text_element.get_attribute('textContent'))

    return reviews

if __name__ == '__main__':
    try:
        driver = webdriver.Firefox()
        reviews = extract_google_reviews(driver, 'STANLEY BRIDGE CYCLES AND SPORTS LIMITED')
    finally:
        driver.quit()

    print(reviews)

lenOfPage = driver.execute_script('window.scrollTo(0, [hard code the height])')

For me I would hardcorde the height if I am using this automated test for this same page over and over again.对我来说,如果我一遍又一遍地对同一页面使用这个自动化测试,我会硬编码高度。

Or you can have it to continuously loop to scroll down the page until the element is found if any.或者您可以让它连续循环向下滚动页面,直到找到元素(如果有)。

Alternatively, you can also get all of the reviews without the browser automation.或者,您也可以在没有浏览器自动化的情况下获得所有评论。

The only thing you need is the data_fid , which you can find in the page source of a place you searched for.您唯一需要的是data_fid ,您可以在搜索地点的页面源中找到它。

在此处输入图片说明

In this case that's: 0x48762038283b0bc3:0xc373b8d4227d0090在这种情况下,即: 0x48762038283b0bc3:0xc373b8d4227d0090

After that, you just have to make a request to: https://www.google.com/async/reviewDialog?hl=en&async=feature_id:0x48762038283b0bc3:0xc373b8d4227d0090,sort_by:,next_page_token:,associated_topic:,_fmt:pc之后,您只需向以下地址发出请求: https : //www.google.com/async/reviewDialog?hl=en&async=feature_id : 0x48762038283b0bc3 : 0xc373b8d4227d0090,sort_by :, next_page_token :, associated_topic :, _fmt

There you will find all the reviews data, as well as the next_page_token , so you can query the next 10 reviews.在那里您将找到所有评论数据以及next_page_token ,因此您可以查询接下来的 10 条评论。

In this case next_page_token is: EgIICg在这种情况下next_page_token是: EgIICg

So, the request URL for the next 10 reviews would be: https://www.google.com/async/reviewDialog?hl=en&async=feature_id:0x48762038283b0bc3:0xc373b8d4227d0090,sort_by:,next_page_token:EgIICg,associated_topic:,_fmt:pc因此,接下来 10 条评论的请求 URL 将是: https : //www.google.com/async/reviewDialog?hl=en&async=feature_id : 0x48762038283b0bc3 : 0xc373b8d4227d0090,sort_by :, next_page_token : Eg_IICg,associated

You could also use a third party solution like SerpApi.您还可以使用第三方解决方案,如 SerpApi。 It's a paid API with a free trial.这是一个免费试用的付费 API。 We handle proxies, solve captchas, and parse all rich structured data for you.我们为您处理代理、解决验证码并解析所有丰富的结构化数据。

Example python code (available in other libraries also):示例 Python 代码(也可在其他库中使用):

from serpapi import GoogleSearch

params = {
  "api_key": "secret_api_key",
  "engine": "google_maps_reviews",
  "hl": "en",
  "data_id": "0x48762038283b0bc3:0xc373b8d4227d0090",
}

search = GoogleSearch(params)
results = search.get_dict()

Example JSON output: JSON 输出示例:

"place_info": {
  "title": "Stanley Bridge Cycles & Sports Ltd",
  "address": "Newnham Parade, 11 College Rd, Cheshunt, Waltham Cross, United Kingdom",
  "rating": 5,
  "reviews": 53
},
"reviews": [
  {
    "user": {
      "name": "Armilson Correia",
      "link": "https://www.google.com/maps/contrib/102797076683495103766?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARAh",
      "thumbnail": "https://lh3.googleusercontent.com/a-/AOh14GgCCH69E_qgfu3pa1xbTsyvH9ORn8PEonb5FcubKg=s40-c-c0x00000000-cc-rp-mo-ba3-br100",
      "local_guide": true,
      "reviews": 48,
      "photos": 9
    },
    "rating": 5,
    "date": "2 days ago",
    "snippet": "In my opinion The best bike shop In radios of 60 miles Very professional and excellent customer service My bike come out from there riding like a New ,no Words just perfect"
  },
  {
    "user": {
      "name": "John Janes",
      "link": "https://www.google.com/maps/contrib/104286744244406721398?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARAt",
      "thumbnail": "https://lh3.googleusercontent.com/a/AATXAJzRZRQx74RYqpNQArE0ER-d24iQ-3kAwK64-46u=s40-c-c0x00000000-cc-rp-mo-br100",
      "reviews": 2,
      "photos": 1
    },
    "rating": 5,
    "date": "a year ago",
    "snippet": "The guys recently built my new bike and the advice on components to use was invaluable. Even the wheels were built from scratch. A knowledgeable efficient team with great attention to detail. I wouldn't go anywhere else .",
    "likes": 1,
    "images": [
      "https://lh5.googleusercontent.com/p/AF1QipMc5u1rIZ88w-cfeAeF2s6bSndHMhLw8YC_BllS=w100-h100-p-n-k-no"
    ]
  },
  {
    "user": {
      "name": "James Wainwright",
      "link": "https://www.google.com/maps/contrib/116302076794615919905?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARA6",
      "thumbnail": "https://lh3.googleusercontent.com/a/AATXAJwx8OTba1pQ9lrzxy7LU5BnrJYWu90METBaK68F=s40-c-c0x00000000-cc-rp-mo-br100",
      "reviews": 36,
      "photos": 7
    },
    "rating": 5,
    "date": "a month ago",
    "snippet": "Want to thank the guys for giving my bike the full service it needed .Its now like new again and I didn't realise how much had worn out.Recomend to anyone in the cheshunt area."
  },
  ...
]

Check out the documentation for more details.查看文档以获取更多详细信息。

Disclaimer: I work at SerpApi.免责声明:我在 SerpApi 工作。

Please share your URL page.请分享您的网址页面。 I've just checked and scrollTo works.我刚刚检查过并且 scrollTo 有效。

driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')

alternatively, you can scroll smoothly或者,您可以平滑滚动

self.driver.execute_script('window.scrollTo({ top: document.body.scrollHeight, behavior: "smooth" });')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM