使用Python和Selenium Webdriver爬取JavaScript

Question

我正在尝试从Ask中抓取广告，这些广告是由Google托管的JS在iframe中生成的。

当我手动浏览并查看源代码时，它们就在那里了（我专门在iframe中寻找ID为“ adBlock”的div）。

但是，当我尝试使用Firefox，Chromedriver或FirefoxPortable时，返回给我的源代码缺少我正在寻找的所有元素。

我尝试使用urllib2进行抓取，即使添加了必要的标头，也得到了相同的结果。 我确定可以肯定，像Webdriver这样创建的物理浏览器实例将解决该问题。

这是我正在处理的代码，必须从几个不同的来源中将它们拼凑在一起：

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pprint

# Create a new instance of the Firefox driver
driver = webdriver.Chrome('C:\Python27\Chromedriver\chromedriver.exe')
driver.get("http://www.ask.com")

print driver.title
inputElement = driver.find_element_by_name("q")

# type in the search
inputElement.send_keys("baseball hats")
# submit the form (although google automatically searches now without submitting)
inputElement.submit()

try:
    WebDriverWait(driver, 10).until(EC.title_contains("baseball"))
    print driver.title
    output = driver.page_source
    print(output)
finally:
    driver.quit()

我知道我在浏览源代码时进行了几种不同的尝试，这不是我所关心的。

关于为什么我会从此脚本中得到一个结果（省略广告）和从其打开的浏览器中获得完全不同的结果（存在广告）的想法？ 我已经尝试过Scrapy，Selenium，Urllib2等。不高兴。

Answer 1

Selenium仅显示当前框架或iframe的内容。 您必须按照以下步骤使用iframe切换到iframe

iframes = driver.find_elements_by_tag_name("iframe")

for iframe in iframes
    driver.switch_to_default_content()
    driver.switch_to_frame(iframe)

    output = driver.page_source
    print(output)

使用Python和Selenium Webdriver爬取JavaScript

问题描述

1 个解决方案

解决方案1
3 已采纳 2014-01-30 02:38:11

使用Python和Selenium Webdriver爬取JavaScript

问题描述

1 个解决方案

解决方案1 3 已采纳 2014-01-30 02:38:11

解决方案1
3 已采纳 2014-01-30 02:38:11