Python selenium for 循环获取值和创建命名元组需要太多时间

Question

在我之前的问题之后，我已经成功完成了我任务的一些小部分。

到目前为止，这是我整理的：

import os
from collections import namedtuple
from operator import itemgetter
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

song = namedtuple('song', 'artist title album duration artistlink songlink albumlink')

path = os.environ['APPDATA'] + '\Mozilla\Firefox\Profiles'
path = (path + '\\' + os.listdir(path)[0]).replace('\\', '/')
profile = webdriver.FirefoxProfile(path)

Firefox = webdriver.Firefox(profile)
wait = WebDriverWait(Firefox, 30)

Firefox.get('https://music.163.com/#/playlist?id=158624364&userid=126762751')

iframe = Firefox.find_element_by_xpath('//iframe[@id="g_iframe"]')
Firefox.switch_to.frame(iframe)

wait.until(EC.visibility_of_element_located((By.XPATH, '//table/tbody/tr')))

rows = Firefox.find_elements_by_xpath('//table/tbody/tr')

entries = []

for row in rows:
    column1 = row.find_element_by_xpath('td[2]/div/div/div/span/a')
    title = column1.text
    songlink = column1.get_attribute('href')
    duration = row.find_element_by_xpath('td[3]/span').text
    column3 = row.find_element_by_xpath('td[4]/div/span/a')
    artist = column3.text
    artistlink = column3.get_attribute('href')
    column4 = row.find_element_by_xpath('td[5]/div/a')
    album = column4.text
    albumlink = column4.get_attribute('href')
    entries.append(song(artist, title, album, duration, artistlink, songlink, albumlink))

等待是必须的，因为 javascript 需要一些时间来加载所有这些条目，如果表被刮得太早，最多只能有 1000 首歌曲。

我关心的是循环部分，只处理 2748 个条目就需要三分钟多的时间。

这一行：

rows = Firefox.find_elements_by_xpath('//table/tbody/tr')

它使整个表格变得非常快（不到三秒），但我不知道为什么在循环中使用多个find_element_by_xpath()和get_attribute()会使代码运行缓慢。

在很短的时间内多次调用这些方法对浏览器来说是否过于繁重，或者创建命名元组本身就很慢？

如何优化？

Answer 1

这与您的代码速度无关，而与正确性有关。
在for循环中，您每次都尝试在特定行内进行搜索，但我不确定您是否得到了您想要的。
在某个父节点元素中搜索子元素时，您应该使用. 说“从这里”，盯着这个节点元素。 否则，您将使用相对于整个 web 页面的td[2]/div/div/div/span/a等相关 XPath 进行搜索。
在这里你可以看到这个解释。
请试试这个并告诉我是否做了一些改变：

for row in rows:
    column1 = row.find_element_by_xpath('.//td[2]/div/div/div/span/a')
    title = column1.text
    songlink = column1.get_attribute('href')
    duration = row.find_element_by_xpath('.//td[3]/span').text
    column3 = row.find_element_by_xpath('.//td[4]/div/span/a')
    artist = column3.text
    artistlink = column3.get_attribute('href')
    column4 = row.find_element_by_xpath('.//td[5]/div/a')
    album = column4.text
    albumlink = column4.get_attribute('href')
    entries.append(song(artist, title, album, duration, artistlink, songlink, albumlink))

Python selenium for 循环获取值和创建命名元组需要太多时间

问题描述

1 个解决方案

解决方案1
1 2021-06-07 12:41:08

Python selenium for 循环获取值和创建命名元组需要太多时间

问题描述

1 个解决方案

解决方案1 1 2021-06-07 12:41:08

解决方案1
1 2021-06-07 12:41:08