簡體   English   中英

如何用按鈕“選項值”抓取表格網站?

[英]How Scraping Table Web-Site with Button "Option value"?

特別是我試圖廢棄這張表( https://whalewisdom.com/filer/berkshire-hathaway-inc#tabholdings_tab_link )但我想通過 python 代碼,前 50 行進行抓取。

出於這個原因,我需要設置選項值才能看到每頁的前 50 行:

在此處輸入圖像描述

我目前的代碼是:

test = {}
dict_scr = {}
for ii in range (0,12):
    options = webdriver.FirefoxOptions()
    options.binary_location = r'C:/Users/Mozilla Firefox/firefox.exe'
    driver = selenium.webdriver.Firefox(executable_path='C:/Users/geckodriver.exe' , options=options)
    driver.execute("get", {'url': link_scr['Links'][ii]})

Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[text()='50']"))))

    test[link_scr.index[ii]] = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, "table#current_holdings_table"))).get_attribute("outerHTML")
    dict_scr[link_scr.index[ii]]  = pd.read_html(test[link_scr.index[ii]])
    print(test[link_scr.index[ii]])

如何修改此代碼以獲得第 50 行抓取 dataframe?

提前感謝您的幫助。

我寫了兩個示例,可以參考github

樣本:

from time import sleep
from clicknium import clicknium as cc, locator

tab = cc.chrome.open("https://whalewisdom.com/filer/berkshire-hathaway-inc#tabholdings_tab_link")
tab.find_element(locator.chrome.whalewisdom.button_25).click()
tab.find_element(locator.chrome.whalewisdom.a_50).click()

sleep(3) #wait for table laoded

elems_sector = tab.find_elements(locator.chrome.whalewisdom.td_informationtechnology)
elemns_shares = tab.find_elements(locator.chrome.whalewisdom.td_890923410)

count = len(elems_sector)
for idx in range(count):
    sector = elems_sector[idx].get_text()
    shares = elemns_shares[idx].get_text()
    print({'sector': sector, 'shares': shares})

sample1:不改變頁碼,抓取兩頁數據

from time import sleep
from clicknium import clicknium as cc, locator

tab = cc.chrome.open("https://whalewisdom.com/filer/berkshire-hathaway-inc#tabholdings_tab_link")

i = 0

while True:
    elems_sector = tab.find_elements(locator.chrome.whalewisdom.td_informationtechnology)
    elemns_shares = tab.find_elements(locator.chrome.whalewisdom.td_890923410)

    count = len(elems_sector)
    for idx in range(count):
        sector = elems_sector[idx].get_text()
        shares = elemns_shares[idx].get_text()
        print({'sector': sector, 'shares': shares})
    i += 1
    if i>1:
        break
    tab.find_element(locator.chrome.whalewisdom.a).click()
    sleep(2) #wait for table loaded

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM