简体   繁体   English

将数组添加到多维数组或对象中

[英]Add arrays into multi-dimensional array or object

I'm parsing content generated by a wysiwyg into a table of contents widget in React. 我正在将由wysiwyg生成的内容解析为React中的目录小部件。

So far I'm looping through the headers and adding them into an array. 到目前为止,我正在遍历标题并将它们添加到数组中。

How can I get them all into one multi-dimensional array or object (what's the best way) so that it looks more like: 如何将它们全部放入一个多维数组或对象(最好的方法),使它看起来更像:

h1-1
    h2-1
        h3-1

h1-2
    h2-2
        h3-2

h1-3
    h2-3
        h3-3

and then I can render it with an ordered list in the UI. 然后我可以在UI中使用有序列表进行渲染。

 const str = "<h1>h1-1</h1><h2>h2-1</h2><h3>h3-1</h3><p>something</p><h1>h1-2</h1><h2>h2-2</h2><h3>h3-2</h3>"; const patternh1 = /<h1>(.*?)<\\/h1>/g; const patternh2 = /<h2>(.*?)<\\/h2>/g; const patternh3 = /<h3>(.*?)<\\/h3>/g; let h1s = []; let h2s = []; let h3s = []; let matchh1, matchh2, matchh3; while (matchh1 = patternh1.exec(str)) h1s.push(matchh1[1]) while (matchh2 = patternh2.exec(str)) h2s.push(matchh2[1]) while (matchh3 = patternh3.exec(str)) h3s.push(matchh3[1]) console.log(h1s) console.log(h2s) console.log(h3s) 

I don't know about you, but I hate parsing HTML using regexes. 我不了解你,但我讨厌用正则表达式解析HTML。 Instead, I think it's a better idea to let the DOM handle this: 相反,我认为让DOM处理这个问题更好:

 const str = `<h1>h1-1</h1> <h3>h3-1</h3> <h3>h3-2</h3> <p>something</p> <h1>h1-2</h1> <h2>h2-2</h2> <h3>h3-2</h3>`; const wrapper = document.createElement('div'); wrapper.innerHTML = str.trim(); let tree = []; let leaf = null; for (const node of wrapper.querySelectorAll("h1, h2, h3, h4, h5, h6")) { const nodeLevel = parseInt(node.tagName[1]); const newLeaf = { level: nodeLevel, text: node.textContent, children: [], parent: leaf }; while (leaf && newLeaf.level <= leaf.level) leaf = leaf.parent; if (!leaf) tree.push(newLeaf); else leaf.children.push(newLeaf); leaf = newLeaf; } console.log(tree); 

This answer does not require h3 to follow h2 ; 这个答案不需要h3跟随h2 ; h3 can follow h1 if you so please. 如果你愿意, h3可以跟随h1 If you want to turn this into an ordered list, that can also be done: 如果要将其转换为有序列表,也可以这样做:

 const str = `<h1>h1-1</h1> <h3>h3-1</h3> <h3>h3-2</h3> <p>something</p> <h1>h1-2</h1> <h2>h2-2</h2> <h3>h3-2</h3>`; const wrapper = document.createElement('div'); wrapper.innerHTML = str.trim(); let tree = []; let leaf = null; for (const node of wrapper.querySelectorAll("h1, h2, h3, h4, h5, h6")) { const nodeLevel = parseInt(node.tagName[1]); const newLeaf = { level: nodeLevel, text: node.textContent, children: [], parent: leaf }; while (leaf && newLeaf.level <= leaf.level) leaf = leaf.parent; if (!leaf) tree.push(newLeaf); else leaf.children.push(newLeaf); leaf = newLeaf; } const ol = document.createElement("ol"); (function makeOl(ol, leaves) { for (const leaf of leaves) { const li = document.createElement("li"); li.appendChild(new Text(leaf.text)); if (leaf.children.length > 0) { const subOl = document.createElement("ol"); makeOl(subOl, leaf.children); li.appendChild(subOl); } ol.appendChild(li); } })(ol, tree); // add it to the DOM document.body.appendChild(ol); // or get it as text const result = ol.outerHTML; 

Since the HTML is parsed by the DOM and not by a regex, this solution will not encounter any errors if the h1 tags have attributes, for example. 由于HTML是由DOM而不是正则表达式解析的,因此,如果h1标签具有属性,则此解决方案不会遇到任何错误。

You can simply gather all h* and then iterate over them to construct a tree as such: 你可以简单地收集所有h*然后迭代它们来构造一个树,如下所示:

Using ES6 (I inferred this is ok from your usage of const and let ) 使用ES6 (我推断这可以从你使用constlet

const str = `
    <h1>h1-1</h1>
    <h2>h2-1</h2>
    <h3>h3-1</h3>
    <p>something</p>
    <h1>h1-2</h1>
    <h2>h2-2</h2>
    <h3>h3-2</h3>
`
const patternh = /<h(\d)>(.*?)<\/h(\d)>/g;

let hs = [];

let matchh;

while (matchh = patternh.exec(str))
    hs.push({ lev: matchh[1], text: matchh[2] })

console.log(hs)

// constructs a tree with the format [{ value: ..., children: [{ value: ..., children: [...] }, ...] }, ...]
const add = (res, lev, what) => {
  if (lev === 0) {
    res.push({ value: what, children: [] });
  } else {
    add(res[res.length - 1].children, lev - 1, what);
  }
}

// reduces all hs found into a tree using above method starting with an empty list
const tree = hs.reduce((res, { lev, text }) => {
  add(res, lev-1, text);
  return res;
}, []);

console.log(tree);

But because your html headers are not in a tree structure themselves (which I guess is your use case) this only works under certain assumptions, eg you cannot have a <h3> unless there's a <h2> above it and a <h1> above that. 但是因为你的html标题本身不在树形结构中(我猜这是你的用例),这只能在某些假设下工作,例如你不能有一个<h3>除非它上面有一个<h2>和一个<h1>那。 It will also assume a lower-level header will always belong to the latest header of an immediately higher level. 它还假设一个较低级别的标题将始终属于一个更高级别的最新标题。

If you want to further use the tree structure for eg rendering a representative ordered-list for a TOC, you can do something like: 如果您想进一步使用树结构来为例如渲染TOC的代表性有序列表,您可以执行以下操作:

// function to render a bunch of <li>s
const renderLIs = children => children.map(child => `<li>${renderOL(child)}</li>`).join('');

// function to render an <ol> from a tree node
const renderOL = tree => tree.children.length > 0 ? `<ol>${tree.value}${renderLIs(tree.children)}</ol>` : tree.value;

// use a root node for the TOC
const toc = renderOL({ value: 'TOC', children: tree });

console.log(toc);

Hope it helps. 希望能帮助到你。

What you want to do is known as (a variant of a) document outline , eg. 您想要做的是被称为(a的变体) 文档大纲 ,例如。 creating a nested list from the headings of a document, honoring their hierarchy. 从文档标题创建嵌套列表,尊重其层次结构。

A simple implementation for the browser using the DOM and DOMParser APIs goes as follows (put into a HTML page and coded in ES5 for easy testing): 使用DOM和DOMParser API的浏览器的简单实现如下(放入HTML页面并在ES5中编码以便于测试):

<!DOCTYPE html>
<html>
<head>
<title>Document outline</title>
</head>
<body>
<div id="outline"></div>
<script>

// test string wrapped in a document (and body) element
var str = "<html><body><h1>h1-1</h1><h2>h2-1</h2><h3>h3-1</h3><p>something</p><h1>h1-2</h1><h2>h2-2</h2><h3>h3-2</h3></body></html>";

// util for traversing a DOM and emit SAX startElement events
function emitSAXLikeEvents(node, handler) {
    handler.startElement(node)
    for (var i = 0; i < node.children.length; i++)
        emitSAXLikeEvents(node.children.item(i), handler)
    handler.endElement(node)
}

var outline = document.getElementById('outline')
var rank = 0
var context = outline
emitSAXLikeEvents(
    (new DOMParser()).parseFromString(str, "text/html").body,
    {
        startElement: function(node) {
            if (/h[1-6]/.test(node.localName)) {
                var newRank = +node.localName.substr(1, 1)

                // set context li node to append
                while (newRank <= rank--)
                    context = context.parentNode.parentNode

                rank = newRank

                // create (if 1st li) or
                // get (if 2nd or subsequent li) ol element
                var ol
                if (context.children.length > 0)
                    ol = context.children[0]
                else {
                    ol = document.createElement('ol')
                    context.appendChild(ol)
                }

                // create and append li with text from
                // heading element
                var li = document.createElement('li')
                li.appendChild(
                  document.createTextNode(node.innerText))
                ol.appendChild(li)

                context = li
            }
        },
        endElement: function(node) {}
    })
</script>
</body>
</html>

I'm first parsing your fragment into a Document , then traverse it to create SAX-like startElement() calls. 我首先将您的片段解析为Document ,然后遍历它以创建类似SAX的startElement()调用。 In the startElement() function, the rank of a heading element is checked against the rank of the most recently created list item (if any). startElement()函数中,将根据最近创建的列表项(如果有)的等级检查标题元素的等级。 Then a new list item is appended at the correct hierarchy level, and possibly an ol element is created as container for it. 然后在正确的层次结构级别附加新的列表项,并且可能创建ol元素作为其容器。 Note the algorithm as it is won't work with "jumping" from h1 to h3 in the hierarchy, but can be easily adapted. 请注意,算法不能在层次结构中从h1 “跳跃”到h3 ,但可以很容易地进行调整。

If you want to create an outline/table of content on node.js, the code could be made to run server-side, but requires a decent HTML parsing lib (a DOMParser polyfill for node.js, so to speak). 如果你想在node.js上创建一个大纲/内容表,可以使代码在服务器端运行,但是需要一个像样的HTML解析库(对于node.js来说,DOMParser polyfill,可以这么说)。 There are also the https://github.com/h5o/h5o-js and the https://github.com/hoyois/html5outliner packages for creating outlines, though I haven't tested those. 还有https://github.com/h5o/h5o-jshttps://github.com/hoyois/html5outliner包用于创建轮廓,但我还没有测试过。 These packages supposedly can also deal with corner cases such as heading elements in iframe and quote elements which you generally don't want in the the outline of your document. 据推测,这些软件包还可以处理角落案例,例如iframe标题元素和quote元素,这些元素通常是您在文档大纲中不需要的。

The topic of creating an HTML5 outline has a long history; 创建HTML5大纲的主题历史悠久; see eg. 见例如。 http://html5doctor.com/computer-says-no-to-html5-document-outline/ . http://html5doctor.com/computer-says-no-to-html5-document-outline/ HTML4's practice of using no sectioning roots (in HTML5 parlance) wrapper elements for sectioning and placing headings and content at the same hierarchy level is known as "flat-earth markup". HTML4的实践是不使用分段根 (在HTML5用语中)包装元素,用于在同一层次结构级别进行切片和放置标题和内容,这种做法称为“平面地球标记”。 SGML has the RANK feature for dealing with H1 , H2 , etc. ranked elements, and can be made to infer omitted section elements, thus automatically create an outline, from HTML4-like "flat earth markup" in simple cases (eg. where only section or another single element is allowed as sectioning root). SGML具有RANK功能,用于处理H1H2等排序元素,并可以用来推断省略的section元素,从而自动创建轮廓,在简单的情况下从类似HTML4的“扁平地球标记”(例如,只有允许使用section或其他单个元素作为section root。

I'll use a single regex to get the <hx></hx> contents and then sort them by x using methods Array.reduce . 我将使用一个单一的正则表达式来获得<hx></hx>内容,然后对它们进行排序x使用方法Array.reduce


Here is the base but it's not over yet : 这是基地, 但还没有结束

 // The string you need to parse const str = "\\ <h1>h1-1</h1>\\ <h2>h2-1</h2>\\ <h3>h3-1</h3>\\ <p>something</p>\\ <h1>h1-2</h1>\\ <h2>h2-2</h2>\\ <h3>h3-2</h3>"; // The regex that will cut down the <hx>something</hx> const regex = /<h[0-9]{1}>(.*?)<\\/h[0-9]{1}>/g; // We get the matches now const matches = str.match(regex); // We match the hx togethers as requested const matchesSorted = Object.values(matches.reduce((tmp, x) => { // We get the number behind hx ---> the x const hNumber = x[2]; // If the container do not exist, create it if (!tmp[hNumber]) { tmp[hNumber] = []; } // Push the new parsed content into the array // 4 is to start after <hx> // length - 9 is to get all except <hx></hx> tmp[hNumber].push(x.substr(4, x.length - 9)); return tmp; }, {})); console.log(matchesSorted); 


As you are parsing html content I want to aware you about special cases like presency of \\n or space . 在解析html内容时,我想了解一些特殊情况,例如\\nspace presency。 For example look at the following non-working snippet : 例如,查看以下非工作代码段:

 // The string you need to parse const str = "\\ <h1>h1-1\\n\\ </h1>\\ <h2> h2-1</h2>\\ <h3>h3-1</h3>\\ <p>something</p>\\ <h1>h1-2 </h1>\\ <h2>h2-2 \\n\\ </h2>\\ <h3>h3-2</h3>"; // The regex that will cut down the <hx>something</hx> const regex = /<h[0-9]{1}>(.*?)<\\/h[0-9]{1}>/g; // We get the matches now const matches = str.match(regex); // We match the hx togethers as requested const matchesSorted = Object.values(matches.reduce((tmp, x) => { // We get the number behind hx ---> the x const hNumber = x[2]; // If the container do not exist, create it if (!tmp[hNumber]) { tmp[hNumber] = []; } // Push the new parsed content into the array // 4 is to start after <hx> // length - 9 is to get all except <hx></hx> tmp[hNumber].push(x.substr(4, x.length - 9)); return tmp; }, {})); console.log(matchesSorted); 


We gotta add .replace() and .trim() in order to remove unwanted \\n and spaces . 我们必须添加.replace().trim()以删除不需要的\\nspaces

Use this snippet 使用此代码段

 // The string you need to parse const str = "\\ <h1>h1-1\\n\\ </h1>\\ <h2> h2-1</h2>\\ <h3>h3-1</h3>\\ <p>something</p>\\ <h1>h1-2 </h1>\\ <h2>h2-2 \\n\\ </h2>\\ <h3>h3-2</h3>"; // Remove all unwanted \\n const preparedStr = str.replace(/(\\r\\n\\t|\\n|\\r\\t)/gm, ""); // The regex that will cut down the <hx>something</hx> const regex = /<h[0-9]{1}>(.*?)<\\/h[0-9]{1}>/g; // We get the matches now const matches = preparedStr.match(regex); // We match the hx togethers as requested const matchesSorted = Object.values(matches.reduce((tmp, x) => { // We get the number behind hx ---> the x const hNumber = x[2]; // If the container do not exist, create it if (!tmp[hNumber]) { tmp[hNumber] = []; } // Push the new parsed content into the array // 4 is to start after <hx> // length - 9 is to get all except <hx></hx> // call trim() to remove unwanted spaces tmp[hNumber].push(x.substr(4, x.length - 9).trim()); return tmp; }, {})); console.log(matchesSorted); 

I write this code works with JQuery. 我写这个代码适用于JQuery。 (Please don't DV . Maybe someone needs a jquery answer later) (请不要DV 。也许以后有人需要jquery答案)

This recursive function creates li s of string and if one item has some childern, it will convert them to an ol . 这个递归函数创建了字符串的li ,如果一个项目有一些childern,它会将它们转换为ol

 const str = "<div><h1>h1-1</h1><h2>h2-1</h2><h3>h3-1</h3></div><p>something</p><h1>h1-2</h1><h2>h2-2</h2><h3>h3-2</h3>"; function strToList(stri) { const tags = $(stri); function partToList(el) { let output = "<li>"; if ($(el).children().length) { output += "<ol>"; $(el) .children() .each(function() { output += partToList($(this)); }); output += "</ol>"; } else { output += $(el).text(); } return output + "</li>"; } let output = "<ol>"; tags.each(function(itm) { output += partToList($(this)); }); return output + "</ol>"; } $("#output").append(strToList(str)); 
 li { padding: 10px; } 
 <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <div id="output"></div> 

(This code can be converted to pure JS easily) (这段代码可以轻松转换为纯JS)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM