简体   繁体   English

Selenium 循环脚本使用过多内存,最终导致 Chrome 崩溃

[英]Selenium loop script using too much ram, eventually crashing Chrome

I have a very large list of URLs I'm trying to scrape , I'm iterating over every URL using a for loop.我有一个非常大的 URL 列表,我正在尝试抓取,我正在使用 for 循环遍历每个 URL。

Eventually, in x element of the list, my Chrome window will crash ('Aw Snap!' error appears on the browser window).最终,在列表的 x 元素中,我的 Chrome 窗口将崩溃('Aw Snap!' 错误出现在浏览器窗口中)。 I don't have any idea to fix this issue.我不知道解决这个问题。

I can't share my code, but is something like this:我不能分享我的代码,但是是这样的:

very_large_url_list = [url1, url2, url3, url4...] very_large_url_list = [url1, url2, url3, url4...]

for x in very_large_url_list:
    driver.get(x)
    doStuff()

If I try to close the driver on every iteration, like this:如果我尝试在每次迭代时关闭驱动程序,如下所示:

for x in very_large_url_list:
    driver.get(x)
    doStuff()
    driver.close()

I'd get an error stating that the session ID is invalid.我会收到一条错误消息,指出会话 ID 无效。 If I don't close it, then a memory leakage will happen eventually and I wont be able to finish the iteration over the list.如果我不关闭它,那么最终会发生内存泄漏,我将无法完成对列表的迭代。 What can I do to fix this issue?我能做些什么来解决这个问题?

Please let me know if I haven't been clear enough so I can edit the question!如果我不够清楚,请告诉我,以便我可以编辑问题!

If you try to close the driver on every iteration, shouldn't you be doing this?如果您尝试在每次迭代时关闭驱动程序,您不应该这样做吗?

for x in very_large_url_list:
    driver = webdriver.Chrome()
    driver.get(x)
    doStuff()
    driver.close()

Do you know that we can open a URL without using any browser as well?您知道我们也可以不使用任何浏览器打开 URL 吗? It is frequently asked interview question as well.这也是经常被问到的面试问题。 Let's learn it.让我们学习它。

Let's perform some steps first:让我们先执行一些步骤:

  1. Open a browser.打开浏览器。
  2. Press F12.按 F12。
  3. Switch to Console tab.切换到控制台选项卡。
  4. Type window.location='https://www.redbus.in' and hit Enter key.输入window.location='https://www.redbus.in'并按 Enter 键。

You will notice that redbus website is loaded.您会注意到 redbus 网站已加载。

This is the way of loading URL without using any methods like get() or navigate() .这是在不使用任何方法(如get()navigate()的情况下加载 URL 的方式。 Above statement is called as JavaScript command.上面的语句被称为 JavaScript 命令。 We will see JavaScript concepts later.我们稍后会看到 JavaScript 的概念。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM