使用Beautifulsoup和Selenium从网页中获取包含某些单词的链接

Question

I wrote this code to log in to my FB account and get all grouplinks on the page using Selenuim and BeautifulSoup but the BeautifulSoup usage isn't working correctly. 我编写了这段代码以登录我的FB帐户，并使用Selenuim和BeautifulSoup获取页面上的所有组链接，但是BeautifulSoup的用法无法正常工作。

I want to know how to use Selenuim and BeautifulSoup in the same code. 我想知道如何在同一代码中使用Selenuim和BeautifulSoup。

I don't want to use the Facebook API; 我不想使用Facebook API； I want to use Selenium and BeautifulSoup. 我想使用Selenium和BeautifulSoup。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer


usr = raw_input('--> ')
pwd = raw_input('--> ')
poo = raw_input('--> ')

driver = webdriver.Firefox()
# or you can use Chrome(executable_path="/usr/bin/chromedriver")
driver.get("https://www.facebook.com/groups/?category=membership")
assert "Facebook" in driver.title
elem = driver.find_element_by_id("email")
elem.send_keys(usr)
elem = driver.find_element_by_id("pass")
elem.send_keys(pwd)
elem.send_keys(Keys.RETURN)

scheight = .1
while scheight < 9.9:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight)
    scheight += .01
soup = BeautifulSoup(html)
http = httplib2.Http()
status, response = ('https://www.facebook.com/groups/?category=membership')

count = 0
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
     count = count + 1
print 'Count: ', count 

for tag in BeautifulSoup(('a')):
    if link.has_key('href'):
        if '/groups/' in link['href']:

          print link['href']


elem = driver.find_element_by_css_selector(".input.textInput")
elem.send_keys(poo)
elem = driver.find_element_by_css_selector(".selected")
elem.send_keys(Keys.RETURN)
elem.click()
time.sleep(5)

Answer 1

You never declared html . 您从未声明过html 。

Selenium's webdriver has a page_source method that you can use: Selenium的webdriver具有page_source方法，您可以使用：

soup = BeautifulSoup(driver.page_source)

Update for second error 更新第二个错误

Your line, 你的电话

status, response = ('https://www.facebook.com/groups/?category=membership')

is trying to assign that string to both status and response . 正在尝试将该字符串分配给status和response 。 There's nothing to assign to response , and therefore that variable is undefined. 没有要分配给response ，因此该变量未定义。

Answer 2

I think BeautifulSoup isn't returning the links right? 我认为BeautifulSoup不会返回链接吗？

I do find links in BeautifulSoup with 我确实在BeautifulSoup中找到链接

soup = BeautifulSoup( html )
for i in soup.find_all( 'a' ):
if '/groups/' in i.get( 'href' ):
    print( i.get( 'href' ) )

使用Beautifulsoup和Selenium从网页中获取包含某些单词的链接

问题描述

2 个解决方案

解决方案1
0 2015-03-18 21:12:53

解决方案2
0 2015-03-21 00:48:08

使用Beautifulsoup和Selenium从网页中获取包含某些单词的链接

问题描述

2 个解决方案

解决方案1 0 2015-03-18 21:12:53

解决方案2 0 2015-03-21 00:48:08

解决方案1
0 2015-03-18 21:12:53

解决方案2
0 2015-03-21 00:48:08