简体   繁体   English

网络抓取时如何绕过动态元素?

[英]How to get around dynamic elements when web-scraping?

The code below works as I am able to click a button on the webpage using Python/Selenium/Firefox.下面的代码有效,因为我可以使用 Python/Selenium/Firefox 单击网页上的按钮。

button on the webpage网页上的按钮

driver.execute_script('''return document.querySelector('dba-app').shadowRoot.getElementById('configRenderer').shadowRoot.querySelector('ing-default-layout-14579').querySelector('dba-overview').shadowRoot.querySelector('ing-feat-agreement-overview').shadowRoot.querySelector('ing-ow-overflow-menu-14587').shadowRoot.querySelector('button')''').click()

However, some elements are dynamic and the numbers are changing anytime you rerun the script.但是,某些元素是动态的,只要您重新运行脚本,数字就会发生变化。

The changing elements:变化的元素:

  • 'ing-default-layout- 14579 ' 'ing-default-layout- 14579 '
  • 'ing-ow-overflow-menu- 14587 ' 'ing-ow-overflow-menu- 14587 '

What must I do to get around the dynamic elements?我必须怎么做才能绕过动态元素?

One option is to look for other attributes that stay the same across pageloads.一种选择是寻找在页面加载过程中保持不变的其他属性。 For example, given your HTML, you could do:例如,给定您的 HTML,您可以这样做:

document.querySelector('#configRenderer') // returns the config renderer element
document.querySelector('[data-tag-name="ing-default-layout"]') // returns the ing-default-layout element
document.querySelector('[data-tag-name="dba-overview]') // returns the dba-overview element

And so on.等等。 Or you could the same method to identify a parent or a child, and then navigate to the child or parent respectively.或者您可以使用相同的方法来识别父项或子项,然后分别导航到子项或父项。

If the HTML isn't stable enough even for that, another approach would be to search through all elements, and find the one(s) whose tagName starts with what you need.如果 HTML 甚至还不够稳定,另一种方法是搜索所有元素,并找到其tagName以您需要的元素开头的元素。

for (const elm of document.querySelectorAll('*')) {
  if (elm.tagName.toLowerCase().startsWith('ing-ow-overflow-menu')) {
    // do stuff with elm, which is the overflow menu element
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM