如何通過硒自動化時使用beautifulsoup打印href屬性？

Question

我想從此HTML訪問藍色元素的href值

我嘗試了幾種打印鏈接的方法，但是沒有用。

我的代碼如下：

discover_page = BeautifulSoup(r.text, 'html.parser')

finding_accounts = discover_page.find_all("a", class_="author track")
print(len(finding_accounts))

finding_accounts = discover_page.find_all('a[class="author track"]')
print(len(finding_accounts))

accounts = discover_page.select('a', {'class': 'author track'})['href']
print(len(accounts))

Output:- 
0
0
TypeError: 'dict' object is not callable

網頁的URL為https://society6.com/discover，但登錄我的帳戶后URL更改為https://society6.com/society?show=2

我在這里做錯了什么？

注意：-我在這里使用硒鉻瀏覽器。 此處給出的答案在我的終端中有效，但在我運行文件時無效

我的完整代碼：

from selenium import webdriver
import time
import requests
from bs4 import BeautifulSoup
import lxml

driver = webdriver.Chrome()
driver.get("https://society6.com/login?done=/")
username = driver.find_element_by_id('email')
username.send_keys("exp4money@gmail.com")
password = driver.find_element_by_id('password')
password.send_keys("sultan1997")
driver.find_element_by_name('login').click()

time.sleep(5)

driver.find_element_by_link_text('My Society').click()
driver.find_element_by_link_text('Discover').click()

time.sleep(5)

r = requests.get(driver.current_url)
r.raise_for_status()

'''discover_page = BeautifulSoup(r.html.raw_html, 'html.parser')

finding_accounts = discover_page.find_all("a", class_="author track")
print(len(finding_accounts))

finding_accounts = discover_page.find_all('a[class="author track"]')
print(len(finding_accounts))


links = []
for a in discover_page.find_all('a', class_ = 'author track'): 
        links.append(a['href'])
        #links.append(a.get('href'))

print(links)'''

#discover_page.find_all('a')

links = []
for a in discover_page.find_all("a", attrs = {"class": "author track"}): 
        links.append(a['href'])
        #links.append(a.get('href'))

print(links)

#soup.find_all("a", attrs = {"class": "author track"})'''

soup = BeautifulSoup(r.content, "lxml")
a_tags = soup.find_all("a", attrs={"class": "author track"})

for a in soup.find_all('a',{'class':'author track'}):
    print('https://society6.com'+a['href'])

文檔中的代碼是我正在嘗試的代碼

Answer 1

如果您希望查找所有鏈接而無需在Beautifulsoup中手動嘗試。 然后去請求-HTML

獲取所有鏈接的示例代碼，

from requests_html import HTMLSession
from bs4 import BeautifulSoup

url = 'https://society6.com/discover'
session = HTMLSession(mock_browser=True)
r = session.get(url, headers={'User-Agent': 'Mozilla/5.0'})

print(r.html.links)
print(r.html.absolute_links)

soup = BeautifulSoup(r.html.raw_html, 'html.parser')
a_tags = soup.find_all("a", attrs={"class": "author track"})
for a_tag in a_tags:
    print(a_tag['href'])

Answer 2

import requests
from bs4 import BeautifulSoup

data = requests.get('https://society6.com/discover')
soup_data = BeautifulSoup(data.content, "lxml")

for a in soup_data.find_all('a',{'class':'author track'}):
    print('https://society6.com'+a['href'])

Answer 3

根據您要從所需元素中打印href問題，可以使用以下解決方案僅使用Selenium ：

代碼塊：

 from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC options = Options() options.add_argument("start-maximized") options.add_argument("disable-infobars") options.add_argument("--disable-extensions") options.add_argument("--disable-gpu") options.add_argument("--no-sandbox") driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\\WebDrivers\\ChromeDriver\\chromedriver_win32\\chromedriver.exe') driver.get("https://society6.com/login?done=/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#email"))).send_keys("exp4money@gmail.com") driver.find_element_by_css_selector("input#password").send_keys("sultan1997") driver.find_element_by_css_selector("button[name='login']").click() WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a#nav-user-my-society>span"))).click() WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.LINK_TEXT, "Discover"))).click() hrefs_elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.author.track"))) for element in hrefs_elements: print(element.get_attribute("href"))

控制台輸出：

 https://society6.com/pivivikstrm https://society6.com/cafelab https://society6.com/cafelab https://society6.com/colorandcolor https://society6.com/83oranges https://society6.com/aftrdrk https://society6.com/alaskanmommabear https://society6.com/thindesign https://society6.com/colorandcolor https://society6.com/aftrdrk https://society6.com/aljahorvat https://society6.com/bribuckley https://society6.com/hennkim https://society6.com/franciscomffonseca https://society6.com/83oranges https://society6.com/nadja1 https://society6.com/beeple https://society6.com/absentisdesigns https://society6.com/alexandratarasoff https://society6.com/artdekay880 https://society6.com/annaki https://society6.com/cafelab https://society6.com/bribuckley https://society6.com/bitart https://society6.com/draw4you https://society6.com/cafelab https://society6.com/beeple https://society6.com/burcukorkmazyurek https://society6.com/absentisdesigns https://society6.com/deanng https://society6.com/beautifulhomes https://society6.com/aftrdrk https://society6.com/printsproject https://society6.com/bluelela https://society6.com/anipani https://society6.com/ecmazur https://society6.com/batkei https://society6.com/menchulica https://society6.com/83oranges https://society6.com/7115

如何通過硒自動化時使用beautifulsoup打印href屬性？

問題描述

3 個解決方案

解決方案1
0 2018-09-16 06:33:07

解決方案2
0 2018-09-16 06:52:11

解決方案3
0 2018-09-16 11:51:47

如何通過硒自動化時使用beautifulsoup打印href屬性？

問題描述

3 個解決方案

解決方案1 0 2018-09-16 06:33:07

解決方案2 0 2018-09-16 06:52:11

解決方案3 0 2018-09-16 11:51:47

解決方案1
0 2018-09-16 06:33:07

解決方案2
0 2018-09-16 06:52:11

解決方案3
0 2018-09-16 11:51:47