Python使用Selenium和Beautiful Soup抓取JavaScript

Question

I'm trying to scrape a JavaScript enables page using BS and Selenium. 我正在尝试使用BS和Selenium抓取JavaScript启用页面。 I have the following code so far. 到目前为止，我有以下代码。 It still doesn't somehow detect the JavaScript (and returns a null value). 它仍然无法以某种方式检测到JavaScript（并返回空值）。 In this case I'm trying to scrape the Facebook comments in the bottom. 在这种情况下，我尝试在底部刮取Facebook评论。 (Inspect element shows the class as postText) （检查元素将类显示为postText）
Thanks for the help! 谢谢您的帮助！

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup.BeautifulSoup(html_source)  
comments = soup("div", {"class":"postText"})  
print comments

Answer 1

There are some mistakes in your code that are fixed below. 您的代码中有一些错误已在下面修复。 However, the class "postText" must exist elsewhere, since it is not defined in the original source code. 但是，类“ postText”必须存在于其他位置，因为它没有在原始源代码中定义。 My revised version of your code was tested and is working on multiple websites. 我对您代码的修订版本已通过测试，并且可以在多个网站上使用。

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
from bs4 import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup(html_source,'html.parser')  
#class "postText" is not defined in the source code
comments = soup.findAll('div',{'class':'postText'})  
print comments

Python使用Selenium和Beautiful Soup抓取JavaScript

问题描述

1 个解决方案

解决方案1
7 2014-03-22 05:04:18

Python使用Selenium和Beautiful Soup抓取JavaScript

问题描述

1 个解决方案

解决方案1 7 2014-03-22 05:04:18

解决方案1
7 2014-03-22 05:04:18