简体   繁体   English

Selenium / Python使用Selenium扩展文本后如何获取全文?

[英]Selenium/ Python How to get full text after expanding the text using Selenium?

I am trying to scrape reviews from TripAdvisor which, for long reviews, display only partial reviews that require clicking 'More' for the full review to be displayed. 我正在尝试从TripAdvisor刮取评论,对于长篇评论,该评论仅显示需要单击“更多”才能显示完整评论的部分评论。 I tried getting the text after clicking more (and I can see that the text is expanded) but all I get is the partial review. 我单击更多后尝试获取文本(并且可以看到文本扩展),但是我得到的只是部分复查。

My code (to scrape one specific review) is as follow: 我的代码(以抓取一份具体评论)如下:

driver = webdriver.Firefox()
driver.get(url)
review = driver.find_element_by_id("review_541350982") 
review.find_element_by_class_name("taLnk.ulBlueLinks").click()
driver.wait = WebDriverWait(driver, 5)
new_review = driver.find_element_by_id("review_541350982")
entry = new_review.find_element_by_class_name("partial_entry")
print entry.text

This is the HTML before clicking on 'More': 这是单击“更多”之前的HTML:

<p class="partial_entry">This place blah blah blah What an...
<span class="taLnk ulBlueLinks" onclick="widgetEvCall('handlers.clickExpand',event,this);">More</span>
</p>

and this is the HTML after: 这是之后的HTML:

<p class="partial_entry">This place blah blah blah What an incredible monument from both a historic and construction point of view.</p>
<span class="taLnk ulBlueLinks" onclick="widgetEvCall('handlers.clickCollapse',event,this);">Show less</span>

I noticed that now <span> comes after <p> after clicking 'More'. 我注意到,现在单击“更多”后, <span><p>之后。 Not sure if this is useful. 不知道这是否有用。

Any advice is greatly appreciated! 任何意见是极大的赞赏!

EDIT: Noticed that introducing time.sleep(1) instead of driver.wait solved the problem. 编辑:注意到引入time.sleep(1)代替driver.wait解决了问题。 Wonder if there is a better way to do this such that the new entry is obtained automatically after it changes and not having to set an arbitrary waiting time? 想知道是否有更好的方法来做到这一点,以便新条目在更改后自动获得,而不必设置任意等待时间?

It is pretty much evident from your code that the WebDriverWait though defined but was not used properly. 从代码中可以明显看出, WebDriverWait虽然已定义但未正确使用。 To print the full text This place blah blah blah What an incredible monument from both a historic and construction point of view. 打印全文This place blah blah blah What an incredible monument from both a historic and construction point of view. , you can use the following code block : ,您可以使用以下代码块:

from selenium.webdriver.support import expected_conditions as EC
#code block
review = driver.find_element_by_id("review_541350982") 
review.find_element_by_class_name("taLnk.ulBlueLinks").click()
new_review = driver.find_element_by_id("review_541350982")
full_review = WebDriverWait(driver, 10).until(EC.text_to_be_present_in_element(new_review.find_element_by_xpath("//p[@class='partial_entry']"),'This place blah blah blah What an incredible monument from both a historic and construction point of view.'))
entry = new_review.find_element_by_class_name("partial_entry")
print entry.text

Locate the review and click more: 找到评论,然后单击更多:

review = driver.find_element_by_id("review_541350982")
partial_text = review.find_element_by_tag_name('p')
partial_text.find_element_by_tag_name('span').click()

Relocate the review using XPath and output the text: 使用XPath重新定位评论并输出文本:

new_review = driver.find_element_by_xpath('(//*[@id="review_541350982"]//p)[1]')
print(new_review.text)

HTH HTH

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM