简体   繁体   English

如何通过 selenium.webdriver 找到属性和元素 id?

[英]How to find the attribute and element id by selenium.webdriver?

I am learning web scrapping since I need it for my work.我正在学习 web 报废,因为我的工作需要它。 I wrote the following code:我写了以下代码:

from selenium import webdriver    
chromedriver='/home/es/drivers/chromedriver'
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(30)
driver.get('http://crdd.osdd.net/raghava/hemolytik/submitkey_browse.php?ran=1955')
df = pd.read_html(driver.find_element_by_id("table.example.display.datatable").get_attribute('example'))[0]

However, it is showing the following error:但是,它显示以下错误:

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="table.example.display.datatable"]"}
  (Session info: chrome=103.0.5060.134)

Then I inspect the table that I wanna scrape this table from this page然后我检查我想从这个页面刮掉这张桌子的桌子在此处输入图像描述

what is the attribute that needs to be included in get_attribute() function in the following line?以下行中需要包含在get_attribute() function 中的属性是什么?

df = pd.read_html(driver.find_element_by_id("table.example.display.datatable").get_attribute('example'))[0]

what I should write in the driver.find_element_by_id ?我应该在driver.find_element_by_id中写什么?

EDITED: Some tables have lots of records in multi-pages.编辑:有些表在多页中有很多记录。 For example, this page has 2,246 entries, which shows 100 entries on each page.例如, 此页面有 2,246 个条目,每页显示 100 个条目。 Once I tried to web-scrape it, there were only 320 entries in df and the record ID is from 1232-1713, which means it took entries from the next few pages and it is not starting from the first page to the end at the last page.一旦我尝试对其进行网络抓取, df中只有 320 个条目,并且记录 ID 为 1232-1713,这意味着它从接下来的几页中获取条目,并且它不是从第一页开始到末尾最后一页。

What we can do in such cases?在这种情况下我们能做些什么?

If you want to select table by @id you need如果你想通过@id select 表你需要

driver.find_element_by_id("example")

By.CSS:作者:CSS:

driver.find_element_by_css_selector("table#example")

By.XPATH:作者:XPATH:

driver.find_element_by_xpath("//table[@id='example'])

If you want to extract @id value you need如果你想提取你需要的@id

.get_attribute('id')

Since there is not much sense in searching by @id to extract that exact @id you might use other attribute of table node:由于通过@id搜索以提取确切的@id没有多大意义,您可以使用table节点的其他属性:

driver.find_element_by_xpath("//table[@aria-describedby='example_info']").get_attribute('id')

You need to get the outerHTML property of the table first, then call the table element from pandas .您需要先获取表格的outerHTML属性,然后从pandas调用表格元素。

You need to wait for element to be visible.您需要等待元素可见。 Use explicit wait like WebdriverWait()使用像WebdriverWait()这样的显式等待

driver.get('http://crdd.osdd.net/raghava/hemolytik/submitkey_browse.php?ran=1955')
table=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table#example")))
tableRows=table.get_attribute("outerHTML")
df = pd.read_html(tableRows)[0]
print(df) 

Import below libraries.导入以下库。

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import pandas as pd

Output: Output:

     ID      PMID  YEAR  ...                                 DSSP Natural Structure Final Structure
0   1643  16137634  2005  ...                     CCCCCCCCCCCSCCCC               NaN             NaN
1   1644  16137634  2005  ...                        CCTTSCCSSCCCC               NaN             NaN
2   1645  16137634  2005  ...                   CTTTCGGGHHHHHHHHCC               NaN             NaN
3   1646  16137634  2005  ...                   CGGGTTTHHHHHHHGGGC               NaN             NaN
4   1647  16137634  2005  ...                CCSCCCSSCHHHHHHHHHTTC               NaN             NaN
5   1910  16730859  2006  ...  CCCCCCCSSCCSHHHHHHHHTTHHHHHHHHSSCCC               NaN             NaN
6   1911  16730859  2006  ...                                CCSCC               NaN             NaN
7   1912  16730859  2006  ...                            CCSSSCSCC               NaN             NaN
8   1913  16730859  2006  ...       CCCSSCCSSCCSHHHHHTTHHHHTTTCSCC               NaN             NaN
9   1914  16730859  2006  ...                 CCSHHHHHHHHHHHHHCCCC               NaN             NaN
10  2110  11226440  2001  ...              CCCSSCCCBTTBTSSSSSSCSCC               NaN             NaN
11  3799   9204560  1997  ...                               CCSSCC               NaN             NaN
12  4149  16137634  2005  ...                       CCHHHHHHHHHHHC               NaN             NaN

[13 rows x 17 columns]

I personally suggest you to use explicit waits instead of implicit ones.我个人建议您使用显式等待而不是隐式等待。
Anyway it's not clear what you're trying to do and what you're looking for.无论如何,尚不清楚您要做什么以及要寻找什么。 So I will just stick to the question and show you how I would find an element ID:因此,我将坚持这个问题并向您展示如何找到元素 ID:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions()
options.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get('http://crdd.osdd.net/raghava/hemolytik/submitkey_browse.php?ran=1955')
df = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, "<XPATH_OF_THE_ELEMENT_YOU_WANT>"))).get_attribute("id")

By the way I suggest you to read the documentation that explains in detail how to locate items .顺便说一句,我建议您阅读详细解释 如何定位项目的文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM