这种类型的递归称为什么，我如何在 JavaScript (Node.js) 中实现它？

Question

我知道如何使用cheerio从使用request （或https ）获得的responseText字符串中提取所有href属性（锚标记）的值，然后创建（唯一）URL 的平面对象。

我不明白的是如何使用递归（无需手动编写每个循环）创建嵌套对象（嵌套对象）。

这个nested对象具有一定的深度（使用称为depth的参数方便地指定）。

例如，假设这是我的代码：

function get(url, callback) {
  // get "responseText" using "requests"
  // extract the anchor tags href from the "responseText"
  callback({"https://url1.com": {}, "https://url2.com": {});
}

get("https://www.example.com", (urls) => {
  console.log(urls);
});

运行代码时，输出应为：

{ "https://url1.com": {}, "https://url2.com": {} }

我不明白的是我如何递归地转到"https://url1.com"然后得到这个输出：

{ "https://url1.com": { "https://sub-url-1.com": {} }, "https://url2.com": { "https://sub-url-2.com": {} } }

如果深度为 5 呢？ 我将如何递归遍历每个 5 级深的子 URL，然后获取它的子 URL？

这种类型的递归称为什么，我如何在 JavaScript 中实现它？

Answer 1

从crawl开始，它需要一个起始 url (string) 和一个起始深度 (int) 并返回一个承诺的result 。 我们的结果是我们预期输出的类型（或“形状”）。 在这种情况下，它是一个以 url 字符串作为键的对象，值是一个空对象或另一个嵌套结果 -

// type url = string
// type result = (url, result) object | empty

// crawl : (string * int) -> result promise
const crawl = (initUrl = '/', initDepth = 0) =>
{ const loop = (urls, depth) =>
    parallel
      ( urls
      , u =>
          depth === 0
            ? [ u, {} ]
            : loop (get (u), depth - 1) 
                .then (r => [ u, r ])
      )
      .then (Object.fromEntries)
  return loop ([ initUrl ], initDepth)
}

垂直样式并不常见，但有助于眼睛识别在制表位垂直规则上对齐的代码元素。 开放式空白允许评论，但随着对样式的熟悉，它们变得不那么必要了 -

// type url = string
// type result = (url, result) object | empty

// crawl : (string * int) -> result promise
const crawl = (initUrl = '/', initDepth = 0) =>
{ const loop = (urls, depth) =>
    parallel             // parallel requests
      ( urls             // each url
      , u =>             // as u
          depth === 0                   // exit condition
            ? [ u, {} ]                 // base: [ key, value ]
            : loop (get (u), depth - 1) // inductive: smaller problem
                .then (r => [ u, r ])   //   [ key, value ]
      )
      .then (Object.fromEntries)        // convert [ key, value ]
                                        //      to { key: value }

  return loop ([ initUrl ], initDepth)  // init loop
}

这利用了通用实用程序parallel ，这对于处理承诺的数组很有用 -

// parallel : (('a array) promise * 'a -> 'b) -> ('b array) promise 
const parallel = async (p, f) =>
  Promise.all ((await p) .map (x => f (x)))

或者，如果您不想依赖async-await -

// parallel : (('a array) promise * 'a -> 'b) -> ('b array) promise 
const parallel = (p, f) =>
  Promise.all
    ( Promise
        .resolve (p)
        .then (r => r .map (x => f (x)))
    )

给定一个模拟的sitemap和相应的get函数 -

// sitemap : (string, string array) object
const sitemap =
  { "/": [ "/a", "/b", "/c" ]
  , "/a": [ "/a/1", "/a/11", "/a/111" ]
  , "/a/1": [ "/a/1/2", "a/1/22" ]
  , "/a/1/2": [ "/a/1/2/3" ]
  , "/a/1/2/3": [ "/a/1/2/3/4" ]
  , "/a/11": [ "/a/11/2", "a/11/22" ]
  , "/a/11/22": [ "/a/11/22/33"]
  , "/b": [ "/b/1" ]
  , "/b/1": [ "/b/1/2" ]
  }

// get : string -> (string array) promise      
const get = async (url = '') =>
  Promise
    .resolve (sitemap[url] || [] )
    .then (delay)

// delay : ('a * int) -> 'a promise
const delay = (x, ms = 250) =>
  new Promise (r => setTimeout (r, ms, x))

我们可以看到crawl在不同深度是如何响应的——

crawl ('/') .then (console.log, console.error)
// { '/': {} }

crawl ('/', 1) .then (console.log, console.error)
// { '/': { '/a': {}, '/b': {}, '/c': {} } }

crawl ('/b', 1) .then (console.log, console.error)
// { '/b': { '/b/1': {} } }

crawl ('/b', 2) .then (console.log, console.error)
// {
//   "/b": {
//     "/b/1": {
//       "/b/1/2": {}
//     }
//   }
// }

这里我们爬取深度为Infinity的根"/" ——

crawl ("/", Infinity) .then (console.log, console.error)
// {
//   "/": {
//     "/a": {
//       "/a/1": {
//         "/a/1/2": {
//           "/a/1/2/3": {
//             "/a/1/2/3/4": {}
//           }
//         },
//         "a/1/22": {}
//       },
//       "/a/11": {
//         "/a/11/2": {},
//         "a/11/22": {}
//       },
//       "/a/111": {}
//     },
//     "/b": {
//       "/b/1": {
//         "/b/1/2": {}
//       }
//     },
//     "/c": {}
//   }
// }

只需将get替换为一个真正的函数，该函数接受一个输入 url 并返回一个 href 数组 - crawl工作原理相同。

展开下面的代码段以在您自己的浏览器中验证结果 -

 const parallel = async (p, f) => Promise.all ((await p) .map (x => f (x))) const crawl = (initUrl = '/', initDepth = 0) => { const loop = (urls, depth) => parallel ( urls , u => depth === 0 ? [ u, {} ] : loop (get (u), depth - 1) .then (r => [ u, r ]) ) .then (Object.fromEntries) return loop ([ initUrl ], initDepth) } // mock const sitemap = { "/": [ "/a", "/b", "/c" ] , "/a": [ "/a/1", "/a/11", "/a/111" ] , "/a/1": [ "/a/1/2", "a/1/22" ] , "/a/1/2": [ "/a/1/2/3" ] , "/a/1/2/3": [ "/a/1/2/3/4" ] , "/a/11": [ "/a/11/2", "a/11/22" ] , "/a/11/22": [ "/a/11/22/33"] , "/b": [ "/b/1" ] , "/b/1": [ "/b/1/2" ] } const get = async (url = '') => Promise .resolve (sitemap[url] || [] ) .then (delay) const delay = (x, ms = 250) => new Promise (r => setTimeout (r, ms, x)) // demos crawl ('/') .then (console.log, console.error) // { '/': {} } crawl ('/', 1) .then (console.log, console.error) // { '/': { '/a': {}, '/b': {}, '/c': {} } } crawl ('/b', 1) .then (console.log, console.error) // { '/b': { '/b/1': {} } } crawl ('/b', 2) .then (console.log, console.error) // { // "/b": { // "/b/1": { // "/b/1/2": {} // } // } // } crawl ("/", Infinity) .then (console.log, console.error) // { // "/": { // "/a": { // "/a/1": { // "/a/1/2": { // "/a/1/2/3": { // "/a/1/2/3/4": {} // } // }, // "a/1/22": {} // }, // "/a/11": { // "/a/11/2": {}, // "a/11/22": {} // }, // "/a/111": {} // }, // "/b": { // "/b/1": { // "/b/1/2": {} // } // }, // "/c": {} // } // }

这种类型的递归称为什么，我如何在 JavaScript (Node.js) 中实现它？

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-08-09 03:03:44

这种类型的递归称为什么，我如何在 JavaScript (Node.js) 中实现它？

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-08-09 03:03:44

解决方案1
2 已采纳 2019-08-09 03:03:44