简体   繁体   English

使用 Selenium 和 Python 查找网页上的所有元素

[英]Find all elements on a web page using Selenium and Python

I am trying to go through a webpage with Selenium and create a set of all elements with certain class names, so I have been using:我正在尝试使用Selenium浏览网页并创建一组具有某些类名的所有元素,所以我一直在使用:

elements = set(driver.find_elements_by_class_name('class name'))

However, in some cases there are thousands of elements on the page (if I scroll down), and I've noticed that this code only finds the first 18-20 elements on the page (only about 14-16 are visible to me at once).然而,在某些情况下,页面上有数千个元素(如果我向下滚动),我注意到这段代码只能找到页面上的前 18-20 个元素(只有大约 14-16 个对我可见一次)。 Do I need to scroll, or am I doing something else wrong?我需要滚动,还是我做错了什么? Is there any way to instantaneously get all of the elements I want in the HTML into a list without having to visually see them on the screen?有什么方法可以立即将我想要的 HTML 中的所有元素放入一个列表中,而不必在屏幕上直观地看到它们?

It depends on your webpage.这取决于您的网页。 Just look at the HTML source code (or the network log), before you scroll down.在向下滚动之前,只需查看 HTML 源代码(或网络日志)。 If there are just the 18-20 elements then the page lazy load the next items (eg Twitter or Instagram).如果只有 18-20 个元素,则页面延迟加载下一个项目(例如 Twitter 或 Instagram)。 This means, the server just renders the next items if you reached a certain point on the webpage.这意味着,如果您到达网页上的某个点,服务器只会呈现下一个项目。 Otherwise all thousand items would be loaded, which would increase the page size, loading time and server load.否则将加载所有数千个项目,这将增加页面大小、加载时间和服务器负载。

In this case, you have to scroll down until the end and then get the source code to parse all items.在这种情况下,您必须向下滚动到最后,然后获取源代码来解析所有项目。

Probably you can use more advanced methods like dealing with each chunk as a kind of page for a pagination method (eg not saying "go to next page" but saying "scroll down").可能您可以使用更高级的方法,例如将每个块处理为一种用于分页方法的页面(例如,不说“转到下一页”而是说“向下滚动”)。 But I guess you're a beginner, so I would start with simple scrolling down to the end (eg scroll, waiting, scroll,... until there are no new elements), then fetching the HTML and then parsing it.但我猜你是初学者,所以我会从简单的向下滚动开始(例如滚动、等待、滚动……直到没有新元素),然后获取 HTML 并解析它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Selenium Python 在网页上找不到元素 - Can't find elements on web page using Selenium Python 如何使用Selenium查找web页面中所有按钮元素的所有id属性的值 - How to find the value of all the id attribute of all the button elements in a web page using Selenium 如何确定使用Selenium和Python查找并打开网页上的所有配置文件的确切xpath? - How do I decide what the exact xpath is to find and open all profiles on a web page using Selenium and Python? Web 使用 Selenium 刮取 python - 不检索所有元素 - Web scraping using Selenium using python - not retrieving all elements Pycharm:使用Python的Selenium:无法使用无头Chrome浏览器查找Web元素 - Pycharm: Selenium with Python: Unable to find web elements using headless chrome Web 造景 | Python Selenium webdriver 使用 xpath 查找动态元素 - Web Scaping | Python Selenium webdriver find dynamic elements using xpath 无法使用 selenium python 检索 Web 表的所有 tr 元素 - Unable to retrieve all tr elements of a web table using selenium python 无法使用 Selenium python 找到页面元素 - Can't find page elements using Selenium python 使用python selenium webdriver查找html页面中的所有子元素 - Finding all child elements in an html page using python selenium webdriver Selenium 无法在 HTML 页面中找到所有元素 - Selenium not able to find all elements in HTML page
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM