简体   繁体   English

如何在 puppeteer 中获取所有 xhr 调用?

[英]How can I get all xhr calls in puppeteer?

I am using puppeteer to load a web page.我正在使用puppeteer加载 web 页面。

const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();
  await page.setRequestInterception(true);
  page.on('request', (request) => {
    console.log(request.url())
    request.continue();
    ...
  }
}
await page.goto(
    'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
    { waitUntil: 'networkidle2' },
  );

I set the request interception to true and log all requests urls.我将请求拦截设置为true并记录所有请求 url。 The requests I logged is a lot less than the requests when I load the url in chrome browser.我记录的请求比我在 chrome 浏览器中加载 url 时的请求要少得多。 At least there is one request https://www.onthehouse.com.au/odin/api/compositeSearch which can be found in chrome dev tool console but not show in above code.至少有一个请求https://www.onthehouse.com.au/odin/api/compositeSearch可以在 chrome 开发工具控制台中找到,但未显示在上述代码中。

I wonder how I can log all requests?我想知道如何记录所有请求?

I did some benchmarking between 4 variants of this script.我在这个脚本的 4 个变体之间做了一些基准测试。 And for me the results were the same.对我来说,结果是一样的。 Note: I did multiple tests, sometimes due to local network speed it was less calls.注意:我进行了多次测试,有时由于本地网络速度的原因,通话次数较少。 But after 2-3 tries Puppeteer was able to catch all requests.但经过 2-3 次尝试后,Puppeteer 能够捕获所有请求。

On the https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195 page there are some async and defer scripts, my hypothesis was that may load differently when we use different Puppeteer settings, or async vs. sync functions inside page.on .https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195页面上有一些asyncdefer脚本,我的假设是当我们使用不同的 Puppeteer 设置时加载可能会有所不同,或page.on中的异步与同步功能。

Note 2: I tested another page, not the one in the original question as I already needed a VPN to visit this Australian website, it was easy from Chrome, with Puppeteer it would take more: trust me the page I tested has similarly tons of analytics and tracking requests.注意 2:我测试了另一个页面,而不是原始问题中的那个页面,因为我已经需要一个 VPN 来访问这个澳大利亚网站,从 Chrome 很容易,使用 Puppeteer 需要更多:相信我,我测试的页面也有类似的吨分析和跟踪请求。


Baseline from Chrome network: 28 calls Chrome 网络的基线:28 次调用

First I've visited xy webpage, the results were 28 calls on the Network tab.首先我访问了 xy 网页,结果是Network选项卡上的28 个调用

Case 1: Original (sync, networkidle2)案例1:原始(同步,networkidle2)

  await page.setRequestInterception(true);
  page.on('request', (request) => {
    console.log(request.url())
    request.continue();
    ...
  }
}
await page.goto(
    'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
    { waitUntil: 'networkidle2' },
  );

Result: 28 calls结果: 28次通话

Case 2: Async, networkidle2案例2:异步,networkidle2

The page.on has an async function inside so we can await the request.url() page.on里面有一个异步 function ,所以我们可以等待request.url()

  await page.setRequestInterception(true);
  page.on('request', async request => {
    console.log(await request.url())
    request.continue();
    ...
  }
}
await page.goto(
    'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
    { waitUntil: 'networkidle2' },
  );

Result: 28 calls结果: 28次通话

Case 3: Sync, networkidle0案例3:同步,networkidle0

Similar as the original, but with networkidle0 .与原始类似,但使用networkidle0

  await page.setRequestInterception(true);
  page.on('request', (request) => {
    console.log(request.url())
    request.continue();
    ...
  }
}
await page.goto(
    'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
    { waitUntil: 'networkidle0' },
  );

Result: 28 calls结果: 28次通话

Case 3: Async, networkidle0案例 3:异步,networkidle0

The page.on has an async function inside so we can await the request.url() . page.on内部有一个异步 function ,因此我们可以等待request.url() Plus networkidle0 .加上networkidle0

  await page.setRequestInterception(true);
  page.on('request', async request => {
    console.log(await request.url())
    request.continue();
    ...
  }
}
await page.goto(
    'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
    { waitUntil: 'networkidle0' },
  );

Result: 28 calls结果: 28次通话


As there was no difference between the number of requests on the Network tab and from Puppeteer, neither the way we launch puppeteer or how we collect the requests my idea is:由于“网络”选项卡上的请求数量和来自 Puppeteer 的请求数量没有区别,因此我们启动 puppeteer 的方式或收集请求的方式都不是我的想法:

  • Either you have accepted the Cookie Consent in your Chrome so the Network will have more requests (these requests only happen after the cookies are accepted), you can accept their cookie policy with a simple navigation, so after you've navigated inside their page there will be more requests on Network immediately.要么您已在 Chrome 中接受 Cookie 同意,因此网络将有更多请求(这些请求仅在 cookies 被接受后发生),您可以通过简单的导航接受他们的 cookie 策略,因此在您导航到他们的页面后将立即在网络上收到更多请求。

    [...] By continuing to use our website, you consent to cookies being used. [...] 继续使用我们的网站,即表示您同意使用 cookies。

Solution: Do not directly visit the desired page, but navigate there through clicks, so your Puppeteer's Chromium will accept the cookie consent, hence you will have all analytics requests as well.解决方案:不要直接访问所需的页面,而是通过点击导航到那里,因此您的 Puppeteer 的 Chromium 将接受 cookie 同意,因此您也将拥有所有分析请求。

  • Some Chrome addon affects the number of requests on the page.某些Chrome 插件会影响页面上的请求数量。

Advise: Check your Puppeteer requests against an incognito Chrome's Network tab, make sure all Extensions/Addons are disabled.建议:检查您的 Puppeteer 请求是否针对隐身 Chrome 的网络选项卡,确保所有扩展程序/插件都已禁用。


+ If you are only interested in XHR then you may need to add request.resourceType to your code to differentiate them from others docs . + 如果您只对XHR感兴趣,那么您可能需要将request.resourceType添加到您的代码中,以将它们与其他文档区分开来。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM