Python - Selenium：通過 find_elements_by() 循環抓取 AngularJS 元素

Question

我正在抓取房地產數據。 在使用 javascript Selenium 生成的網站上，Selenium 做得非常出色：您可以找到包含相關信息的標簽，並使用

driver.find_elements_by...

但是在這個站點上，列表是由 angular js 生成的。 我嘗試了同樣的方法：

for article in driver.find_elements_by_css_selector("div.property.ng-scope"):
    do something

我發現我必須讓我的網絡驅動程序（phantomJS）單擊通向各個列表站點的鏈接：

linkbase = article.find_element_by_css_selector("div.info.clear.ng-scope")
link = linkbase.find_element_by_tag_name('a')
link.click()

然后 webdriver 只是指向該站點，我可以獲得我想要的一個列表的所有信息。

一旦循環結束，我就會收到以下錯誤：

> Message: {"errorMessage":"Element does not exist in cache","request":{"headers":
{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","
Content-Length":"142","Content-Type":"application/json;charset=UTF-8","Host":"12
7.0.0.1:56577","User-Agent":"Python-urllib/3.4"},"httpVersion":"1.1","method":"P
OST","post":"{\"sessionId\": \"f9ec2c10-dfd9-11e5-9d4c-3bbe8f5bf7c0\", \"using\"
: \"css selector\", \"id\": \":wdc:1456856343349\", \"value\": \"div.info.clear.
ng-scope\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"elemen
t","directory":"/","path":"/element","relative":"/element","port":"","host":"","
password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/ele
ment","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/f9ec2c10-dfd9-
11e5-9d4c-3bbe8f5bf7c0/element/:wdc:1456856343349/element"}}

頁面上包含鏈接的元素是：

<a ng-href="/detail/prodej/dum/rodinny/jemnice-jemnice-/3800125532" ng-click="beforeOpen(i.iterator, i.regionTip)" class="title" href="/detail/prodej/dum/rodinny/jemnice-jemnice-/3800125532">
<span class="name ng-binding"> ... </a>

這只是每個列表的標題文本。 我確實按照這個答案設置了一個用戶代理，即使它沒有出現在錯誤中。 我也在加載周圍的元素之前等待：

wait = WebDriverWait(driver, getSearchResults_CZ.waiting)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.content")))

我想要的是解析所有這些屬性元素，將它們的鏈接保存到列表中，然后遍歷列表，使用driver.get()打開每個鏈接我知道通過單擊鏈接，驅動程序 url 會發生變化，但我認為一旦使用find_elements_by建立了文章列表，它將作為一個穩定的參考點。 在這種情況下，通過搜索“a”標簽並調用get_attribute('href') 來訪問鏈接在 angular js 框架中不起作用。 我沒有看到什么？

編輯：正如所回答的，沒有 .click() 的 get_attribute 是正確的方法。 我最初的錯誤與 CSS 選擇器有關：我一直在使用“div[class^='property']”並得到了一個完全不同的鏈接。 一定是發現了另一個我以前沒見過的元素。

Answer 1

等待至少一個“屬性”可見，然后獲取鏈接：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("http://www.sreality.cz/hledani/prodej/domy?region=jemnice")
driver.maximize_window()

wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "property")))

links = [link.get_attribute("href") for link in driver.find_elements_by_css_selector("div.property div.info a")]
print(links)

driver.close()

為我工作。

Python - Selenium：通過 find_elements_by() 循環抓取 AngularJS 元素

問題描述

1 個解決方案

解決方案1
1 已采納 2016-03-01 20:12:40

Python - Selenium：通過 find_elements_by() 循環抓取 AngularJS 元素

問題描述

1 個解決方案

解決方案1 1 已采納 2016-03-01 20:12:40

解決方案1
1 已采納 2016-03-01 20:12:40