使用 javascript 从 chrome 控制台抓取一个网站

Question

I'm trying to scrape a webpage that contains a table of 1000 elements.我正在尝试抓取包含 1000 个元素的表格的网页。 The webpage updates a tag whenever an element in the table is clicked.每当单击表格中的元素时，网页都会更新标签。 From the chrome console, I want to click on each element, wait for the tag to be updated and download the tag.从 chrome 控制台，我想点击每个元素，等待标签更新并下载标签。 Currently I am doing the following:目前我正在做以下事情：

for(i = 0; i < 1000; i++){
  document.querySelectorAll('element_in table')[i].click()
  text = document.querySelector('tag_to_read_from').innerHTML
  // download text
}

The problem is that there is a delay in the tag being updated after an element in the table is clicked.问题是在单击表中的元素后更新标签有延迟。 As a result, due to the asynchronous nature of javascript, the script is downloading 1000 empty files.结果，由于 javascript 的异步性质，脚本正在下载 1000 个空文件。

Is there a way to wait for X seconds after clicking on an element, and then download the updated tag?有没有办法在点击一个元素后等待 X 秒，然后下载更新的标签？

Answer 1

function scrape (index, max) {
  document.querySelectorAll('element_in table')[index].click();
  
  setTimeout(() => {
    // download text

    if (index < max) scrape(++index, max);
  }, 5000);
}

scrape(0, 1000);

You can use a timeout to delay the logic.您可以使用超时来延迟逻辑。 Just pick a big enough time.只要选择一个足够大的时间。

使用 javascript 从 chrome 控制台抓取一个网站

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-30 21:18:05

使用 javascript 从 chrome 控制台抓取一个网站

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-30 21:18:05

解决方案1
1 已采纳 2020-12-30 21:18:05