使用 Selenium Python 获取动态表数据

Question

So I am trying to parse this data from a dynamic table with selenium, it keeps getting the old data from page 1, I am trying to get gather pages 2's data, I've tried to search for other answers, but haven't found any, some say I need to add a wait period, and I did, however that didn't work.所以我试图用 selenium 从动态表中解析这些数据，它不断从第 1 页获取旧数据，我试图收集第 2 页的数据，我试图搜索其他答案，但没有找到任何，有人说我需要添加一个等待期，我做到了，但是那没有用。

 from selenium import webdriver

from bs4 import BeautifulSoup

from selenium.webdriver.support import expected_conditions as EC


browser = webdriver.Firefox()
browser.get('https://www.nyse.com/listings_directory/stock')

symbol_list=[]

table_data=browser.find_elements_by_xpath("//td");

def append_to_list(data):

    for element in data:

      symbol_list.append(element.text)


append_to_list(table_data)

pages=browser.find_elements_by_xpath('//a[@href="#"]')


for page in pages:

    if(page.get_attribute("rel")== "next"):

        if(page.text=="NEXT ›"):

            page.click()

            browser.implicitly_wait(100)

            for elem in browser.find_elements_by_xpath("//td"): //still fetchs the data from page 1

                print(elem.text)

            #print(symbol_list)

Answer 1

I modified your script as below.我修改了你的脚本如下。

You should retrieve element in for loop or it will cause stale element reference exception.您应该在 for 循环中检索元素，否则会导致过时的元素引用异常。

And using WebDriverWait to wait for elements to be visible before find element.并使用 WebDriverWait 在查找元素之前等待元素可见。

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from time import sleep

browser = webdriver.Chrome()
browser.get('https://www.nyse.com/listings_directory/stock')

symbol_list = []


while True:
    try:
        table_data = WebDriverWait(browser, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//table//td")))
        for i in range(1, len(table_data)+1):
            td_text = browser.find_element_by_xpath("(//table//td)["+str(i)+"]").text
            print(td_text)
            symbol_list.append(td_text)
        next_page = WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.XPATH, '//a[@href="#" and contains(text(),"Next")]')))
        next_clickable = next_page.find_element_by_xpath("..").get_attribute("class")  # li
        if next_clickable == 'disabled':
            break
        print("Go to next page ...")
        next_page.click()
        sleep(3)
    except Exception as e:
        print(e)
        break

使用 Selenium Python 获取动态表数据

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-02-13 06:55:33

使用 Selenium Python 获取动态表数据

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-02-13 06:55:33

解决方案1
1 已采纳 2020-02-13 06:55:33