简体   繁体   中英

python selenium find elements by class return whole website instead of element

I am trying to use selenium to read the table from this website into a pandas dataframe. Link here

However, when i try to print the Dataframe out, it gives me everything on the website such as the top section like website search , advance search , as well as the bottom section Disclaimer | Hyperlink Policy | Privacy Policy Disclaimer | Hyperlink Policy | Privacy Policy Disclaimer | Hyperlink Policy | Privacy Policy and ©2010 Hong Kong Exchanges and Clearing Limited...

Instead of just the table. Not sure what is the issue here.

url = r'https://www.hkex.com.hk/eng/market/sec_tradinfo/stockcode/eisdeqty.htm'


path_to_chrome_driver = r'C:\chromedriver.exe'
driver = webdriver.Chrome(executable_path=path_to_chrome_driver)
driver.get(url)
driver.find_element_by_class_name('table_grey_border').find_element_by_tag_name('tbody')
z = pandas.read_html(driver.page_source, flavor='bs4')

print(z)

Note: I have also tried the below code, but still same results.

driver.find_element_by_class_name('table_grey_border')

As you don't use the return value from the find_element_by_class_name you won't see those results, you need to use the return value of your find_element_by_class_name .

Actually you used driver.page_source which is the whole html.

Change this :

driver.find_element_by_class_name('table_grey_border').find_element_by_tag_name('tbody')
z = pandas.read_html(driver.page_source, flavor='bs4')

To this :

res = driver.find_element_by_class_name('table_grey_border').find_element_by_tag_name('tbody')
print (res)

如果只需要特定元素的HTML代码,则需要使用以下代码:

driver.find_element_by_class_name('table_grey_border').find_element_by_tag_name('tbody').get_attribute('outerHTML')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM