简体   繁体   English

如何通过硒和python查找抽搐视频的href属性?

[英]How to find the href attribute of the videos on twitch through selenium and python?

I'm trying to find the twitch video IDs of all videos for a specific user. 我正在尝试查找特定用户的所有视频的抽搐视频ID。 So for example on this page https://www.twitch.tv/dyrus/videos/all 因此,例如在此页面上https://www.twitch.tv/dyrus/videos/all

So here we have all videos linked, but its not quite so simple as to just scrape the html and find the links since they are generated dynamically it seems. 因此,在这里我们链接了所有视频,但是看起来并不是那么简单,因为刮擦html并找到链接,因为它们是动态生成的。

So I heard about selenium and did something like this: 所以我听说了硒,并做了这样的事情:

from selenium import webdriver

# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver') 
driver.get('https://www.twitch.tv/dyrus/videos/all')
link_element = driver.find_elements_by_xpath("//*[@href]")


for link in link_element:
    print(link.get_attribute('href'))

driver.close()

This returns me a bunch of links on the page but not the videos, they lie "deeper" I think, any input? 这给我返回了页面上的一堆链接,但没有返回视频,我认为它们“更深”,有输入吗?

Thanks in advance 提前致谢

With your locator, you are returning every element on the page that contains an href attribute. 使用定位器,您将返回页面上包含href属性的每个元素。 You can be a little more specific than that and get what you are looking for. 您可以比这更具体一些,然后得到想要的东西。 Switch to a CSS selector... 切换到CSS选择器...

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC    

# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver') 
driver.get('https://www.twitch.tv/dyrus/videos/all')
links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-a-target='preview-card-image-link']")))

for link in links:
    print(link.get_attribute('href'))

driver.close()

That prints 40 links from the page. 从页面打印40个链接。

I would still suggest a couple of changes as follows: 我仍然建议进行如下更改:

  • Always open the Web Browser in maximized mode so that all/majority of the desired elements are within the Viewport . 始终以最大化模式打开Web浏览器,使所有/大多数所需元素都在Viewport中
  • If you are on Windows OS you need to append the extension .exe at the end of the WebDriver variant name, eg chromedriver.exe 如果您使用的是Windows操作系统 ,则需要在WebDriver变体名称的末尾附加扩展名.exe ,例如chromedriver.exe
  • While you identify for elements always try to include the class attribute in your Locator Strategy . 在标识元素时,请始终尝试在Locator Strategy中包括class属性。
  • Always invoke driver.quit() at the end of your @Test to close & destroy the WebDriver and Web Client instances gracefully. 始终在@Test的末尾调用driver.quit() ,以优雅地关闭和破坏WebDriverWeb Client实例。
  • Here is your own code block with the above mentioned tweaks: 这是您自己的代码块,其中包含上述调整:

     from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options = Options() options.add_argument("start-maximized") options.add_argument("disable-infobars") driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\\path\\to\\chromedriver.exe') driver.get('https://www.twitch.tv/dyrus/videos/all') link_elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.tw-interactive.tw-link[data-a-target='preview-card-image-link']"))) for link in link_elements: print(link.get_attribute('href')) driver.quit() 
  • Console Output: 控制台输出:

     https://www.twitch.tv/videos/295314690 https://www.twitch.tv/videos/294901947 https://www.twitch.tv/videos/294472813 https://www.twitch.tv/videos/294075254 https://www.twitch.tv/videos/293617036 https://www.twitch.tv/videos/293236560 https://www.twitch.tv/videos/292800601 https://www.twitch.tv/videos/292409437 https://www.twitch.tv/videos/292328170 https://www.twitch.tv/videos/292032996 https://www.twitch.tv/videos/291625563 https://www.twitch.tv/videos/291192151 https://www.twitch.tv/videos/290824842 https://www.twitch.tv/videos/290434348 https://www.twitch.tv/videos/290021370 https://www.twitch.tv/videos/289561690 https://www.twitch.tv/videos/289495488 https://www.twitch.tv/videos/289138003 https://www.twitch.tv/videos/289110429 https://www.twitch.tv/videos/288804893 https://www.twitch.tv/videos/288784992 https://www.twitch.tv/videos/288687479 https://www.twitch.tv/videos/288432438 https://www.twitch.tv/videos/288117849 https://www.twitch.tv/videos/288004968 https://www.twitch.tv/videos/287689102 https://www.twitch.tv/videos/287451192 https://www.twitch.tv/videos/287267032 https://www.twitch.tv/videos/287017431 https://www.twitch.tv/videos/286819343 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM