簡體   English   中英

如何通過硒和python查找抽搐視頻的href屬性?

[英]How to find the href attribute of the videos on twitch through selenium and python?

我正在嘗試查找特定用戶的所有視頻的抽搐視頻ID。 因此,例如在此頁面上https://www.twitch.tv/dyrus/videos/all

因此,在這里我們鏈接了所有視頻,但是看起來並不是那么簡單,因為刮擦html並找到鏈接,因為它們是動態生成的。

所以我聽說了硒,並做了這樣的事情:

from selenium import webdriver

# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver') 
driver.get('https://www.twitch.tv/dyrus/videos/all')
link_element = driver.find_elements_by_xpath("//*[@href]")


for link in link_element:
    print(link.get_attribute('href'))

driver.close()

這給我返回了頁面上的一堆鏈接,但沒有返回視頻,我認為它們“更深”,有輸入嗎?

提前致謝

使用定位器,您將返回頁面上包含href屬性的每個元素。 您可以比這更具體一些,然后得到想要的東西。 切換到CSS選擇器...

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC    

# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver') 
driver.get('https://www.twitch.tv/dyrus/videos/all')
links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-a-target='preview-card-image-link']")))

for link in links:
    print(link.get_attribute('href'))

driver.close()

從頁面打印40個鏈接。

我仍然建議進行如下更改:

  • 始終以最大化模式打開Web瀏覽器,使所有/大多數所需元素都在Viewport中
  • 如果您使用的是Windows操作系統 ,則需要在WebDriver變體名稱的末尾附加擴展名.exe ,例如chromedriver.exe
  • 在標識元素時,請始終嘗試在Locator Strategy中包括class屬性。
  • 始終在@Test的末尾調用driver.quit() ,以優雅地關閉和破壞WebDriverWeb Client實例。
  • 這是您自己的代碼塊,其中包含上述調整:

     from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options = Options() options.add_argument("start-maximized") options.add_argument("disable-infobars") driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\\path\\to\\chromedriver.exe') driver.get('https://www.twitch.tv/dyrus/videos/all') link_elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.tw-interactive.tw-link[data-a-target='preview-card-image-link']"))) for link in link_elements: print(link.get_attribute('href')) driver.quit() 
  • 控制台輸出:

     https://www.twitch.tv/videos/295314690 https://www.twitch.tv/videos/294901947 https://www.twitch.tv/videos/294472813 https://www.twitch.tv/videos/294075254 https://www.twitch.tv/videos/293617036 https://www.twitch.tv/videos/293236560 https://www.twitch.tv/videos/292800601 https://www.twitch.tv/videos/292409437 https://www.twitch.tv/videos/292328170 https://www.twitch.tv/videos/292032996 https://www.twitch.tv/videos/291625563 https://www.twitch.tv/videos/291192151 https://www.twitch.tv/videos/290824842 https://www.twitch.tv/videos/290434348 https://www.twitch.tv/videos/290021370 https://www.twitch.tv/videos/289561690 https://www.twitch.tv/videos/289495488 https://www.twitch.tv/videos/289138003 https://www.twitch.tv/videos/289110429 https://www.twitch.tv/videos/288804893 https://www.twitch.tv/videos/288784992 https://www.twitch.tv/videos/288687479 https://www.twitch.tv/videos/288432438 https://www.twitch.tv/videos/288117849 https://www.twitch.tv/videos/288004968 https://www.twitch.tv/videos/287689102 https://www.twitch.tv/videos/287451192 https://www.twitch.tv/videos/287267032 https://www.twitch.tv/videos/287017431 https://www.twitch.tv/videos/286819343 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM