[英]How Scraping Table Web-Site with Button "Option value"?
特别是我试图废弃这张表( https://whalewisdom.com/filer/berkshire-hathaway-inc#tabholdings_tab_link )但我想通过 python 代码,前 50 行进行抓取。
出于这个原因,我需要设置选项值才能看到每页的前 50 行:
我目前的代码是:
test = {}
dict_scr = {}
for ii in range (0,12):
options = webdriver.FirefoxOptions()
options.binary_location = r'C:/Users/Mozilla Firefox/firefox.exe'
driver = selenium.webdriver.Firefox(executable_path='C:/Users/geckodriver.exe' , options=options)
driver.execute("get", {'url': link_scr['Links'][ii]})
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[text()='50']"))))
test[link_scr.index[ii]] = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, "table#current_holdings_table"))).get_attribute("outerHTML")
dict_scr[link_scr.index[ii]] = pd.read_html(test[link_scr.index[ii]])
print(test[link_scr.index[ii]])
如何修改此代码以获得第 50 行抓取 dataframe?
提前感谢您的帮助。
我写了两个示例,可以参考github :
样本:
from time import sleep
from clicknium import clicknium as cc, locator
tab = cc.chrome.open("https://whalewisdom.com/filer/berkshire-hathaway-inc#tabholdings_tab_link")
tab.find_element(locator.chrome.whalewisdom.button_25).click()
tab.find_element(locator.chrome.whalewisdom.a_50).click()
sleep(3) #wait for table laoded
elems_sector = tab.find_elements(locator.chrome.whalewisdom.td_informationtechnology)
elemns_shares = tab.find_elements(locator.chrome.whalewisdom.td_890923410)
count = len(elems_sector)
for idx in range(count):
sector = elems_sector[idx].get_text()
shares = elemns_shares[idx].get_text()
print({'sector': sector, 'shares': shares})
sample1:不改变页码,抓取两页数据
from time import sleep
from clicknium import clicknium as cc, locator
tab = cc.chrome.open("https://whalewisdom.com/filer/berkshire-hathaway-inc#tabholdings_tab_link")
i = 0
while True:
elems_sector = tab.find_elements(locator.chrome.whalewisdom.td_informationtechnology)
elemns_shares = tab.find_elements(locator.chrome.whalewisdom.td_890923410)
count = len(elems_sector)
for idx in range(count):
sector = elems_sector[idx].get_text()
shares = elemns_shares[idx].get_text()
print({'sector': sector, 'shares': shares})
i += 1
if i>1:
break
tab.find_element(locator.chrome.whalewisdom.a).click()
sleep(2) #wait for table loaded
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.