简体   繁体   中英

How to find the href attribute of the videos on twitch through selenium and python?

I'm trying to find the twitch video IDs of all videos for a specific user. So for example on this page https://www.twitch.tv/dyrus/videos/all

So here we have all videos linked, but its not quite so simple as to just scrape the html and find the links since they are generated dynamically it seems.

So I heard about selenium and did something like this:

from selenium import webdriver

# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver') 
driver.get('https://www.twitch.tv/dyrus/videos/all')
link_element = driver.find_elements_by_xpath("//*[@href]")


for link in link_element:
    print(link.get_attribute('href'))

driver.close()

This returns me a bunch of links on the page but not the videos, they lie "deeper" I think, any input?

Thanks in advance

With your locator, you are returning every element on the page that contains an href attribute. You can be a little more specific than that and get what you are looking for. Switch to a CSS selector...

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC    

# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver') 
driver.get('https://www.twitch.tv/dyrus/videos/all')
links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-a-target='preview-card-image-link']")))

for link in links:
    print(link.get_attribute('href'))

driver.close()

That prints 40 links from the page.

I would still suggest a couple of changes as follows:

  • Always open the Web Browser in maximized mode so that all/majority of the desired elements are within the Viewport .
  • If you are on Windows OS you need to append the extension .exe at the end of the WebDriver variant name, eg chromedriver.exe
  • While you identify for elements always try to include the class attribute in your Locator Strategy .
  • Always invoke driver.quit() at the end of your @Test to close & destroy the WebDriver and Web Client instances gracefully.
  • Here is your own code block with the above mentioned tweaks:

     from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options = Options() options.add_argument("start-maximized") options.add_argument("disable-infobars") driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\\path\\to\\chromedriver.exe') driver.get('https://www.twitch.tv/dyrus/videos/all') link_elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.tw-interactive.tw-link[data-a-target='preview-card-image-link']"))) for link in link_elements: print(link.get_attribute('href')) driver.quit() 
  • Console Output:

     https://www.twitch.tv/videos/295314690 https://www.twitch.tv/videos/294901947 https://www.twitch.tv/videos/294472813 https://www.twitch.tv/videos/294075254 https://www.twitch.tv/videos/293617036 https://www.twitch.tv/videos/293236560 https://www.twitch.tv/videos/292800601 https://www.twitch.tv/videos/292409437 https://www.twitch.tv/videos/292328170 https://www.twitch.tv/videos/292032996 https://www.twitch.tv/videos/291625563 https://www.twitch.tv/videos/291192151 https://www.twitch.tv/videos/290824842 https://www.twitch.tv/videos/290434348 https://www.twitch.tv/videos/290021370 https://www.twitch.tv/videos/289561690 https://www.twitch.tv/videos/289495488 https://www.twitch.tv/videos/289138003 https://www.twitch.tv/videos/289110429 https://www.twitch.tv/videos/288804893 https://www.twitch.tv/videos/288784992 https://www.twitch.tv/videos/288687479 https://www.twitch.tv/videos/288432438 https://www.twitch.tv/videos/288117849 https://www.twitch.tv/videos/288004968 https://www.twitch.tv/videos/287689102 https://www.twitch.tv/videos/287451192 https://www.twitch.tv/videos/287267032 https://www.twitch.tv/videos/287017431 https://www.twitch.tv/videos/286819343 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM