简体   繁体   English

在 node.js 中使用 wappalyzer 和 puppeteer

[英]Using wappalyzer and puppeteer in node.js

I am trying to build a scraper to monitor web projects automatically.我正在尝试构建一个刮板来自动监控 Web 项目。

So far so good, the script is running, but now I want to add a feature that automatically analyses what libraries I used in the projects.到目前为止一切顺利,脚本正在运行,但现在我想添加一个自动分析我在项目中使用的库的功能。 The most powerful script for this job is wappalyser.这项工作最强大的脚本是 wappalyser。 They have a node package ( https://www.npmjs.com/package/wappalyzer ) and it's written that you can use it combined with pupperteer.他们有一个节点包( https://www.npmjs.com/package/wappalyzer )并且写着你可以将它与 pupperteer 结合使用。

I managed to run pupperteer and to log the source code of the sites in the console, but I don't get the right way to pass the source code to the wappalyzer analyse function.我设法运行 pupperteer 并在控制台中记录站点的源代码,但是我没有正确的方法将源代码传递给 wappalyzer 分析函数。

Do you guys have a hint for me?你们对我有什么暗示吗?

I tryed this code but a am getting a TypeError: url.split is not a function我尝试了这段代码,但得到了 TypeError: url.split is not a function

function getLibarys(url) {

  (async () => {
    const browser = await puppeteer.launch({ headless: true });
    const page = await browser.newPage();
    await page.goto(url);

// get source code with puppeteer
const html = await page.content();

const wappalyzer = new Wappalyzer();

(async function () {
  try {
    await wappalyzer.init()

    // Optionally set additional request headers
    const headers = {}

    const site = await wappalyzer.open(page, headers)

    // Optionally capture and output errors
    site.on('error', console.error)

    const results = await site.analyze()

    console.log(JSON.stringify(results, null, 2))
  } catch (error) {
    console.error(error)
  }

  await wappalyzer.destroy()
})()
await browser.close()
  })()
} 

Fixed it by using the sample code from wappalyzer.通过使用 wappalyzer 的示例代码修复它。

function getLibarys(url) {


const Wappalyzer = require('wappalyzer');

 
  const options = {
    debug: false,
    delay: 500,
    headers: {},
    maxDepth: 3,
    maxUrls: 10,
    maxWait: 5000,
    recursive: true,
    probe: true,
    proxy: false,
    userAgent: 'Wappalyzer',
    htmlMaxCols: 2000,
    htmlMaxRows: 2000,
    noScripts: false,
    noRedirect: false,
  };
  
  const wappalyzer = new Wappalyzer(options)
  
  ;(async function() {
    try {
      await wappalyzer.init()


  // Optionally set additional request headers
  const headers = {}

  const site = await wappalyzer.open(url, headers)

  // Optionally capture and output errors
  site.on('error', console.error)

  const results = await site.analyze()

  console.log(JSON.stringify(results, null, 2))
} catch (error) {
  console.error(error)
}

await wappalyzer.destroy()
  })()
} 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM