[英]Using wappalyzer and puppeteer in node.js
I am trying to build a scraper to monitor web projects automatically.我正在尝试构建一个刮板来自动监控 Web 项目。
So far so good, the script is running, but now I want to add a feature that automatically analyses what libraries I used in the projects.到目前为止一切顺利,脚本正在运行,但现在我想添加一个自动分析我在项目中使用的库的功能。 The most powerful script for this job is wappalyser.
这项工作最强大的脚本是 wappalyser。 They have a node package ( https://www.npmjs.com/package/wappalyzer ) and it's written that you can use it combined with pupperteer.
他们有一个节点包( https://www.npmjs.com/package/wappalyzer )并且写着你可以将它与 pupperteer 结合使用。
I managed to run pupperteer and to log the source code of the sites in the console, but I don't get the right way to pass the source code to the wappalyzer analyse function.我设法运行 pupperteer 并在控制台中记录站点的源代码,但是我没有正确的方法将源代码传递给 wappalyzer 分析函数。
Do you guys have a hint for me?你们对我有什么暗示吗?
I tryed this code but a am getting a TypeError: url.split is not a function我尝试了这段代码,但得到了 TypeError: url.split is not a function
function getLibarys(url) {
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url);
// get source code with puppeteer
const html = await page.content();
const wappalyzer = new Wappalyzer();
(async function () {
try {
await wappalyzer.init()
// Optionally set additional request headers
const headers = {}
const site = await wappalyzer.open(page, headers)
// Optionally capture and output errors
site.on('error', console.error)
const results = await site.analyze()
console.log(JSON.stringify(results, null, 2))
} catch (error) {
console.error(error)
}
await wappalyzer.destroy()
})()
await browser.close()
})()
}
Fixed it by using the sample code from wappalyzer.通过使用 wappalyzer 的示例代码修复它。
function getLibarys(url) {
const Wappalyzer = require('wappalyzer');
const options = {
debug: false,
delay: 500,
headers: {},
maxDepth: 3,
maxUrls: 10,
maxWait: 5000,
recursive: true,
probe: true,
proxy: false,
userAgent: 'Wappalyzer',
htmlMaxCols: 2000,
htmlMaxRows: 2000,
noScripts: false,
noRedirect: false,
};
const wappalyzer = new Wappalyzer(options)
;(async function() {
try {
await wappalyzer.init()
// Optionally set additional request headers
const headers = {}
const site = await wappalyzer.open(url, headers)
// Optionally capture and output errors
site.on('error', console.error)
const results = await site.analyze()
console.log(JSON.stringify(results, null, 2))
} catch (error) {
console.error(error)
}
await wappalyzer.destroy()
})()
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.