[英]Find all elements on a web page using Selenium and Python
I am trying to go through a webpage with Selenium
and create a set of all elements with certain class names, so I have been using:我正在尝试使用
Selenium
浏览网页并创建一组具有某些类名的所有元素,所以我一直在使用:
elements = set(driver.find_elements_by_class_name('class name'))
However, in some cases there are thousands of elements on the page (if I scroll down), and I've noticed that this code only finds the first 18-20 elements on the page (only about 14-16 are visible to me at once).然而,在某些情况下,页面上有数千个元素(如果我向下滚动),我注意到这段代码只能找到页面上的前 18-20 个元素(只有大约 14-16 个对我可见一次)。 Do I need to scroll, or am I doing something else wrong?
我需要滚动,还是我做错了什么? Is there any way to instantaneously get all of the elements I want in the HTML into a list without having to visually see them on the screen?
有什么方法可以立即将我想要的 HTML 中的所有元素放入一个列表中,而不必在屏幕上直观地看到它们?
It depends on your webpage.这取决于您的网页。 Just look at the HTML source code (or the network log), before you scroll down.
在向下滚动之前,只需查看 HTML 源代码(或网络日志)。 If there are just the 18-20 elements then the page lazy load the next items (eg Twitter or Instagram).
如果只有 18-20 个元素,则页面延迟加载下一个项目(例如 Twitter 或 Instagram)。 This means, the server just renders the next items if you reached a certain point on the webpage.
这意味着,如果您到达网页上的某个点,服务器只会呈现下一个项目。 Otherwise all thousand items would be loaded, which would increase the page size, loading time and server load.
否则将加载所有数千个项目,这将增加页面大小、加载时间和服务器负载。
In this case, you have to scroll down until the end and then get the source code to parse all items.在这种情况下,您必须向下滚动到最后,然后获取源代码来解析所有项目。
Probably you can use more advanced methods like dealing with each chunk as a kind of page for a pagination method (eg not saying "go to next page" but saying "scroll down").可能您可以使用更高级的方法,例如将每个块处理为一种用于分页方法的页面(例如,不说“转到下一页”而是说“向下滚动”)。 But I guess you're a beginner, so I would start with simple scrolling down to the end (eg scroll, waiting, scroll,... until there are no new elements), then fetching the HTML and then parsing it.
但我猜你是初学者,所以我会从简单的向下滚动开始(例如滚动、等待、滚动……直到没有新元素),然后获取 HTML 并解析它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.