[英]Get text of previous header in HTML
我有一个 HTML,它看起来像这样:
<h1>Title</h1>
<p>Some additional content, can be multiple, various tags</p>
<h2><a id="123"></a>Foo</h2>
<p>Some additional content, can be multiple, various tags</p>
<h3><a id="456"></a>Bar</h3>
现在,对于每个带有 id 的锚点,我想找出 header 层次结构,例如,对于带有id="123"
的锚点,我想得到类似[{level: 1, title: "Title"}, {level: 2, title: "Foo"}]
,同样对于id="456"
的锚,我想得到[{level: 1, title: "Title"}, {level: 2, title: "Foo"}, {level: 3, title: "Bar"}]
。
到目前为止,我的代码看起来像这样:
const linkModel: IDictionary<ILinkModelEntry> = {};
const $ = cheerio.load(html);
$("a").each((_i, elt) => {
const anchor = $(elt);
const id = anchor.attr().id;
if (id) {
const parent = anchor.parent();
const parentTag = parent.prop("tagName");
let headerHierarchy: any[] = [];
if (["H1", "H2", "H3", "H4", "H5", "H6"].includes(parentTag)) {
let level = parseInt(parentTag[1]);
headerHierarchy = [{level, text: parent.text()}];
level--;
while (level > 0) {
const prevHeader = parent.prev("h" + level);
const text = prevHeader.text();
headerHierarchy.unshift({level, text});
level--;
}
}
linkModel["#" + id] = {originalId: id, count: count++, headerHierarchy};
}
});
我做错了什么,因为
const prevHeader = parent.prev("h" + level);
const text = prevHeader.text();
总是返回一个空字符串(即""
)?
如果我理解正确的话,您正在寻求捕获层次结构。 如果你的例子有另一个<h1>
后面跟着更多的<h2>
和<h3>
s,你想要将父堆栈弹出回到新的<h1>
级别以链接未来的<h2>
和<h3>
children 而不是将所有元素的数组备份到第一个<h1>Title</h1>
。
这是一种方法:
const cheerio = require("cheerio"); // ^1.0.0-rc.12
const html = `
<h1>Title</h1>
<p>Some additional content, can be multiple, various tags</p>
<h2><a id="123"></a>Foo</h2>
<p>Some additional content, can be multiple, various tags</p>
<h3><a id="456"></a>Bar</h3>
<h1>Another Title</h1>
<h2><a id="xxx"></a>Foo 2</h2>
<h3><a id="yyy"></a>Bar 2</h3>`;
const $ = cheerio.load(html);
const result = {};
const stack = [];
[...$("h1,h2,h3,h4,h5,h6")].forEach(e => {
const level = +$(e).prop("tagName")[1];
while (stack.length && level <= stack.at(-1).level) {
stack.pop();
}
if (!stack.length || level >= stack.at(-1).level) {
stack.push({level, title: $(e).text()});
}
if ($(e).has("a[id]").length) {
const id = $(e).find("a[id]").attr("id");
result[`#${id}`] = [...stack];
}
});
console.log(result);
Output:
{
'#123': [ { level: 1, title: 'Title' }, { level: 2, title: 'Foo' } ],
'#456': [
{ level: 1, title: 'Title' },
{ level: 2, title: 'Foo' },
{ level: 3, title: 'Bar' }
],
'#xxx': [
{ level: 1, title: 'Another Title' },
{ level: 2, title: 'Foo 2' }
],
'#yyy': [
{ level: 1, title: 'Another Title' },
{ level: 2, title: 'Foo 2' },
{ level: 3, title: 'Bar 2' }
]
}
如果你真的想要整个祖先链线性回到第一个,然后删除while
循环(不太可能是你的意图)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.