繁体   English   中英

获取HTML中上一个header的短信

[英]Get text of previous header in HTML

我有一个 HTML,它看起来像这样:

<h1>Title</h1>
<p>Some additional content, can be multiple, various tags</p>
<h2><a id="123"></a>Foo</h2>
<p>Some additional content, can be multiple, various tags</p>
<h3><a id="456"></a>Bar</h3>

现在,对于每个带有 id 的锚点,我想找出 header 层次结构,例如,对于带有id="123"的锚点,我想得到类似[{level: 1, title: "Title"}, {level: 2, title: "Foo"}] ,同样对于id="456"的锚,我想得到[{level: 1, title: "Title"}, {level: 2, title: "Foo"}, {level: 3, title: "Bar"}]

到目前为止,我的代码看起来像这样:

const linkModel: IDictionary<ILinkModelEntry> = {};
const $ = cheerio.load(html);
$("a").each((_i, elt) => {
    const anchor = $(elt);
    const id = anchor.attr().id;
    if (id) {
        const parent = anchor.parent();
        const parentTag = parent.prop("tagName");
        let headerHierarchy: any[] = [];
        if (["H1", "H2", "H3", "H4", "H5", "H6"].includes(parentTag)) {
            let level = parseInt(parentTag[1]);
            headerHierarchy = [{level, text: parent.text()}];
            level--;
            while (level > 0) {
                const prevHeader = parent.prev("h" + level);
                const text = prevHeader.text();
                headerHierarchy.unshift({level, text});
                level--;
            }
        }
        linkModel["#" + id] = {originalId: id, count: count++, headerHierarchy};
    }
});

我做错了什么,因为

const prevHeader = parent.prev("h" + level);
const text = prevHeader.text();

总是返回一个空字符串(即"" )?

如果我理解正确的话,您正在寻求捕获层次结构。 如果你的例子有另一个<h1>后面跟着更多的<h2><h3> s,你想要将父堆栈弹出回到新的<h1>级别以链接未来的<h2><h3> children 而不是将所有元素的数组备份到第一个<h1>Title</h1>

这是一种方法:

const cheerio = require("cheerio"); // ^1.0.0-rc.12

const html = `
<h1>Title</h1>
<p>Some additional content, can be multiple, various tags</p>
<h2><a id="123"></a>Foo</h2>
<p>Some additional content, can be multiple, various tags</p>
<h3><a id="456"></a>Bar</h3>
<h1>Another Title</h1>
<h2><a id="xxx"></a>Foo 2</h2>
<h3><a id="yyy"></a>Bar 2</h3>`;

const $ = cheerio.load(html);
const result = {};
const stack = [];

[...$("h1,h2,h3,h4,h5,h6")].forEach(e => {
  const level = +$(e).prop("tagName")[1];

  while (stack.length && level <= stack.at(-1).level) {
    stack.pop();
  }

  if (!stack.length || level >= stack.at(-1).level) {
    stack.push({level, title: $(e).text()});
  }

  if ($(e).has("a[id]").length) {
    const id = $(e).find("a[id]").attr("id");
    result[`#${id}`] = [...stack];
  }
});

console.log(result);

Output:

{
  '#123': [ { level: 1, title: 'Title' }, { level: 2, title: 'Foo' } ],
  '#456': [
    { level: 1, title: 'Title' },
    { level: 2, title: 'Foo' },
    { level: 3, title: 'Bar' }
  ],
  '#xxx': [
    { level: 1, title: 'Another Title' },
    { level: 2, title: 'Foo 2' }
  ],
  '#yyy': [
    { level: 1, title: 'Another Title' },
    { level: 2, title: 'Foo 2' },
    { level: 3, title: 'Bar 2' }
  ]
}

如果你真的想要整个祖先链线性回到第一个,然后删除while循环(不太可能是你的意图)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM