简体   繁体   English

单击链接后获取噩梦,等待下一页加载

[英]Get Nightmare to wait for next page load after clicking link

I'm using nightmare.js to scrape public records and am just trying to get the scraper to wait for the next page to load. 我正在使用nightmare.js抓取公共记录,只是想让抓取器等待下一页加载。 I'm crawling search results which I press a next button to (obviously) get to the next page. 我正在抓取搜索结果,然后按一个下一个按钮(显然)可以转到下一页。 I can't use nightmare.wait(someConstTime) to accurately wait for the next page to load because sometimes someConstTime is shorter than the time it takes for the next page to load (although it's always under 30 seconds). 我无法使用nightmare.wait(someConstTime)来准确地等待下一页的加载,因为有时someConstTime短于下一页的加载时间(尽管始终少于30秒)。 I also can't use nightmare.wait(selector) because the same selectors are always present on all result pages. 我也不能使用nightmare.wait(selector)因为所有结果页面上总是存在相同的选择器。 In that case nightmare basically doesn't wait at all because the selector is already present (on the page I already scraped) so it it will proceed to scrape the same page several times unless the new page loads before the next loop. 在那种情况下,噩梦基本上根本不会等待,因为选择器已经存在(在我已经抓取的页面上),因此它将继续抓取同一页几次,除非在下一个循环之前加载新页面。

How can I conditionally wait for the next page to load after I click on the next button? 单击下一步按钮后,如何有条件地等待下一页加载?

If I could figure out how - I would compare the "Showing # to # of ## entries" indicator of the current page ( currentPageStatus ) to the last known value ( lastPageStatus ) and wait until they're different (hence the next page loaded). 如果我能想出如何-我的“显示#至##项#”比较当前页面(指标currentPageStatus )到最后一个已知值( lastPageStatus ),并等待,直到他们是不同的(因此下一个页面加载)。

在此处输入图片说明 (ignore that the example image only has one search result page) (忽略示例图片只有一个搜索结果页面)

I'd do that using this code from https://stackoverflow.com/a/36734481/3491991 but that would require passing lastPageStatus into deferredWait (which I can't figure out). 我会使用来自https://stackoverflow.com/a/36734481/3491991的代码来做到这一点,但这需要将lastPageStatus传递到deferredWait (我不知道)。

Here's the code I've got so far: 这是到目前为止我得到的代码:

// Load dependencies
//const { csvFormat } = require('d3-dsv');
const Nightmare = require('nightmare');
const fs = require('fs');
var vo = require('vo');

const START = 'http://propertytax.peoriacounty.org';
var parcelPrefixes = ["01","02","03","04","05","06","07","08","09","10",
                      "11","12","13","14","15","16","17","18","19"]

vo(main)(function(err, result) {
  if (err) throw err;
});

function* main() {
  var nightmare = Nightmare(),
    currentPage = 0;
    // Go to Peoria Tax Records Search
    try {
      yield nightmare
        .goto(START)
        .wait('input[name="property_key"]')
        .insert('input[name="property_key"]', parcelPrefixes[0])
        // Click search button (#btn btn-success)
        .click('.btn.btn-success')
    } catch(e) {
      console.error(e)
    }
    // Get parcel numbers ten at a time
    try {
      yield nightmare
        .wait('.sorting_1')
        isLastPage = yield nightmare.visible('.paginate_button.next.disabled')
        while (!isLastPage) {
          console.log('The current page should be: ', currentPage); // Display page status
          try {
            const result = yield nightmare
              .evaluate(() => {
                return [...document.querySelectorAll('.sorting_1')]
                  .map(el => el.innerText);
              })
              // Save property numbers
              // fs.appendFile('parcels.txt', result, (err) => {
              //   if (err) throw err;
              //   console.log('The "data to append" was appended to file!');
              // });
          } catch(e) {
            console.error(e);
            return undefined;
          }
          yield nightmare
            // Click next page button
            .click('.paginate_button.next')
            // ************* THIS IS WHERE I NEED HELP *************** BEGIN
            // Wait for next page to load before continue while loop
            try {
              const currentPageStatus = yield nightmare
                .evaluate(() => {
                  return document.querySelector('.dataTables_info').innerText;
                })
              console.log(currentPageStatus);
            } catch(e) {
              console.error(e);
              return undefined;
            }
            // ************* THIS IS WHERE I NEED HELP *************** END
          currentPage++;
          isLastPage = yield nightmare.visible('.paginate_button.next.disabled')
        }
    } catch(e) {
      console.error(e)
    }
  yield nightmare.end();
}

I had a similar issue that I managed to fix. 我有一个类似的问题,我设法解决。 Basically I had to navigate to a search page, select the '100 per page' option and then wait for the refresh. 基本上,我必须导航到搜索页面,选择“每页100个”选项,然后等待刷新。 Only problem was, it was a crapshoot as to whether a manual wait time allowed the AJAX to fire and repopulate with more than 10 results (the default). 唯一的问题是,这是一个关于手动等待时间是否允许AJAX触发并重新填充10个以上结果(默认)的问题。

I ended up doing this: 我最终这样做:

nightmare
.goto(url)
.wait('input.button.primary')
.click('input.button.primary')
.wait('#searchresults')
.select('#resultsPerPage',"100")
.click('input.button.primary')
.wait('.searchresult:nth-child(11)')
.evaluate(function() {
    ...
}
.end()

With this, the evaluate won't fire until it detects at least 11 divs with the class of .searchresult. 使用此方法,直到检测到至少11个具有.searchresult类的div时,评估才会触发。 Given that the default is 10, it has to wait for the reload for this to complete. 假设默认值为10,它必须等待重新加载才能完成。

You could extend this to scrape the total number of available results from the first page to ensure that there are - in my case - more than 10 available. 您可以将其扩展为从首页抓取可用结果的总数,以确保在我的情况下,可用的结果超过10个。 But the foundation of the concept works. 但是这个概念的基础起作用了。

From what I could understand, basically you need the DOM change to be completed before you start extracting from the page being loaded. 据我了解,基本上,您需要先完成DOM更改,然后再开始从要加载的页面中提取内容。

In your case, the element for DOM changes is table with CSS selector: '#search-results' 在您的情况下,用于DOM更改的元素是带有CSS选择器的表:'#search-results'

I think MutationObserver is what you need. 我认为MutationObserver是您所需要的。

I have used Mutation Summary library which provides a nice wrapper on raw functionality of MutationObservers, to achieve something similar 我使用了Mutation Summary库,该库为MutationObservers的原始功能提供了一个很好的包装器,以实现类似的功能

var observer = new MutationSummary({
  callback: updateWidgets,
  queries: [{
    element: '[data-widget]'
  }]
});

:From Tutorial :从教程

First register MutationSummary observer when the search results are loaded. 加载搜索结果时,首先注册MutationSummary观察器。

Then, after clicking 'Next' use nightmare.evaluate to wait for mutationSummary callback to return extracted values. 然后,单击“下一步”后,使用nightmare.evaluate等待mutationSummary回调返回提取的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM