[英]How can I get all xhr calls in puppeteer?
I am using puppeteer
to load a web page.我正在使用
puppeteer
加载 web 页面。
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (request) => {
console.log(request.url())
request.continue();
...
}
}
await page.goto(
'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
{ waitUntil: 'networkidle2' },
);
I set the request interception to true
and log all requests urls.我将请求拦截设置为
true
并记录所有请求 url。 The requests I logged is a lot less than the requests when I load the url in chrome browser.我记录的请求比我在 chrome 浏览器中加载 url 时的请求要少得多。 At least there is one request
https://www.onthehouse.com.au/odin/api/compositeSearch
which can be found in chrome dev tool console but not show in above code.至少有一个请求
https://www.onthehouse.com.au/odin/api/compositeSearch
可以在 chrome 开发工具控制台中找到,但未显示在上述代码中。
I wonder how I can log all requests?我想知道如何记录所有请求?
I did some benchmarking between 4 variants of this script.我在这个脚本的 4 个变体之间做了一些基准测试。 And for me the results were the same.
对我来说,结果是一样的。 Note: I did multiple tests, sometimes due to local network speed it was less calls.
注意:我进行了多次测试,有时由于本地网络速度的原因,通话次数较少。 But after 2-3 tries Puppeteer was able to catch all requests.
但经过 2-3 次尝试后,Puppeteer 能够捕获所有请求。
On the https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195 page there are some async
and defer
scripts, my hypothesis was that may load differently when we use different Puppeteer settings, or async vs. sync functions inside page.on
.在https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195页面上有一些
async
和defer
脚本,我的假设是当我们使用不同的 Puppeteer 设置时加载可能会有所不同,或page.on
中的异步与同步功能。
Note 2: I tested another page, not the one in the original question as I already needed a VPN to visit this Australian website, it was easy from Chrome, with Puppeteer it would take more: trust me the page I tested has similarly tons of analytics and tracking requests.注意 2:我测试了另一个页面,而不是原始问题中的那个页面,因为我已经需要一个 VPN 来访问这个澳大利亚网站,从 Chrome 很容易,使用 Puppeteer 需要更多:相信我,我测试的页面也有类似的吨分析和跟踪请求。
First I've visited xy webpage, the results were 28 calls on the Network tab.首先我访问了 xy 网页,结果是Network选项卡上的28 个调用。
await page.setRequestInterception(true);
page.on('request', (request) => {
console.log(request.url())
request.continue();
...
}
}
await page.goto(
'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
{ waitUntil: 'networkidle2' },
);
Result: 28 calls结果: 28次通话
The page.on
has an async function inside so we can await the request.url()
page.on
里面有一个异步 function ,所以我们可以等待request.url()
await page.setRequestInterception(true);
page.on('request', async request => {
console.log(await request.url())
request.continue();
...
}
}
await page.goto(
'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
{ waitUntil: 'networkidle2' },
);
Result: 28 calls结果: 28次通话
Similar as the original, but with networkidle0
.与原始类似,但使用
networkidle0
。
await page.setRequestInterception(true);
page.on('request', (request) => {
console.log(request.url())
request.continue();
...
}
}
await page.goto(
'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
{ waitUntil: 'networkidle0' },
);
Result: 28 calls结果: 28次通话
The page.on
has an async function inside so we can await the request.url()
. page.on
内部有一个异步 function ,因此我们可以等待request.url()
。 Plus networkidle0
.加上
networkidle0
。
await page.setRequestInterception(true);
page.on('request', async request => {
console.log(await request.url())
request.continue();
...
}
}
await page.goto(
'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
{ waitUntil: 'networkidle0' },
);
Result: 28 calls结果: 28次通话
As there was no difference between the number of requests on the Network tab and from Puppeteer, neither the way we launch puppeteer or how we collect the requests my idea is:由于“网络”选项卡上的请求数量和来自 Puppeteer 的请求数量没有区别,因此我们启动 puppeteer 的方式或收集请求的方式都不是我的想法:
[...] By continuing to use our website, you consent to cookies being used.
[...] 继续使用我们的网站,即表示您同意使用 cookies。
Solution: Do not directly visit the desired page, but navigate there through clicks, so your Puppeteer's Chromium will accept the cookie consent, hence you will have all analytics requests as well.解决方案:不要直接访问所需的页面,而是通过点击导航到那里,因此您的 Puppeteer 的 Chromium 将接受 cookie 同意,因此您也将拥有所有分析请求。
Advise: Check your Puppeteer requests against an incognito Chrome's Network tab, make sure all Extensions/Addons are disabled.建议:检查您的 Puppeteer 请求是否针对隐身 Chrome 的网络选项卡,确保所有扩展程序/插件都已禁用。
+ If you are only interested in XHR then you may need to add request.resourceType
to your code to differentiate them from others docs . + 如果您只对XHR感兴趣,那么您可能需要将
request.resourceType
添加到您的代码中,以将它们与其他文档区分开来。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.