Selenium webdriver用python抓取动态页面找不到元素

Question

所以在stackoverflow上有很多关于动态内容抓取的问题，我经历了所有这些，但是所有建议的解决方案都不适用于以下问题：

语境：

在 python 中使用 Selenium webdriver
我主要使用此资源： http ://selenium-python.readthedocs.org/page-objects.html 关于 Python.org 示例。
抓取页面： http : //propertymap.sfplanning.org/

问题：

我无法访问此页面上的任何 DOM 元素。 请注意，如果我能得到一些有关如何访问搜索栏和搜索按钮的提示，那将是一个很好的开始。 See page to scrape最后我想要的是浏览地址列表，启动搜索，并复制屏幕右侧显示的信息。

我尝试了以下方法：

更改了 webdriver 的浏览器（从 Chrome 到 Firefox）

增加了页面加载的等待时间

try: WebDriverWait(self.driver, 10).until(EC.presence_of_element_located((By.ID, "addressInput"))) except: print "address input not found"

尝试通过 ID、XPATH、NAME、TAG NAME 等访问该项目，但没有任何效果。

问题

到目前为止我还没有尝试过什么（使用 Selenium webdriver）？
有些网站真的无法抓取吗？ （我认为每次我重新加载页面时，城市都没有使用算法来生成任何随机 DOM）。

Answer 1

您可以使用此 url http://50.17.237.182/PIM/获取源：

In [73]: from selenium import webdriver


In [74]: dr = webdriver.PhantomJS()

In [75]: dr.get("http://50.17.237.182/PIM/")

In [76]: print(dr.find_element_by_id("addressInput"))
<selenium.webdriver.remote.webelement.WebElement object at 0x7f4d21c80950>

如果查看返回的源代码，就会发现带有该 src url 的 frame 属性：

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>

<head>
  <title>San Francisco Property Information Map </title>
  <META name="description" content="Public access to useful property information and resources at the click of a mouse"><META name="keywords" content="san francisco, property, information, map, public, zoning, preservation, projects, permits, complaints, appeals">
</head>
<frameset rows="100%,*" border="0">
  <frame src="http://50.17.237.182/PIM" frameborder="0" />
  <frame frameborder="0" noresize />
</frameset>

<!-- pageok -->
<!-- 02 -->
<!-- -->
</html>

感谢@Alecxe，这是使用dr.switch_to.frame(0)的最简单方法：

In [77]: dr = webdriver.PhantomJS()

In [78]: dr.get("http://propertymap.sfplanning.org/")

In [79]:  dr.switch_to.frame(0)  

In [80]: print(dr.find_element_by_id("addressInput"))
<selenium.webdriver.remote.webelement.WebElement object at 0x7f4d21c80190>

如果您在浏览器中访问http://50.17.237.182/PIM/ ，您将看到与propertymap.sfplanning.org/完全相同的内容，唯一的区别是您可以使用前者完全访问元素。

如果你想输入一个值并点击搜索框，它是这样的：

from selenium import webdriver


dr = webdriver.PhantomJS()
dr.get("http://propertymap.sfplanning.org/")

dr.switch_to.frame(0)

dr.find_element_by_id("addressInput").send_keys("whatever")
dr.find_element_by_xpath("//input[@title='Search button']").click()

但是如果你想拉数据，你可能会发现使用 url 查询是一个更简单的选择，你会从查询中得到一些 json。

Selenium webdriver用python抓取动态页面找不到元素

问题描述

语境：

问题：

1 个解决方案

解决方案1
2 已采纳 2016-03-28 23:35:56

Selenium webdriver用python抓取动态页面找不到元素

问题描述

语境：

问题：

1 个解决方案

解决方案1 2 已采纳 2016-03-28 23:35:56

解决方案1
2 已采纳 2016-03-28 23:35:56