[英]How to extract data from dynamic collapsing table with hidden elements using Selenium in Python
I try to scrape these 20 classifications from https://patents.google.com/patent/JP2009517369A/en?oq=JP2009517369
, from which the first is displayed and the others are hidden in an expandable section. 我尝试从https://patents.google.com/patent/JP2009517369A/en?oq=JP2009517369
抓取这20个分类,从中显示第一个,其他隐藏在可扩展的部分中。
I already tried to get the first visible one with 我已经尝试过用
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree' and not(@hidden)]/state-modifier[@class='code style-scope classification-tree']/a[@class='style-scope state-modifier']"))).get_attribute("innerHTML")
However, it raises an exception and I don't know why. 但是,这引发了一个异常,我不知道为什么。 So I figured that scraping the whole table would be easier but most of the elements are folded. 因此,我认为刮刮整个表格会更容易,但是大多数元素都折叠了。
Is there any approach on how to scrape dynamic hidden tables? 有什么方法可以抓取动态隐藏表吗? Thank you for your help! 谢谢您的帮助!
The First two options should print the value C07C311/51
前两个选项应打印值C07C311/51
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree' and not(@hidden)]/state-modifier[@class='code style-scope classification-tree']/a[@class='style-scope state-modifier']"))).text)
OR 要么
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree' and not(@hidden)]/state-modifier[@class='code style-scope classification-tree']/a[@class='style-scope state-modifier']"))).get_attribute("innerHTML"))
However if you do not get the expected value try the last one.this should print any hidden content. 但是,如果您没有获得期望值,请尝试最后一个,这应该会打印任何隐藏的内容。
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree' and not(@hidden)]/state-modifier[@class='code style-scope classification-tree']/a[@class='style-scope state-modifier']"))).get_attribute("textContent"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.