繁体   English   中英

无法在 JavaScript 数组中提取开始和结束 HTML 标签组

[英]Not able to extract groups of start and end HTML tags in JavaScript array

我有这个 JavaScript 数组:

let a = [
    [0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of "],
    [1, "<strong>"],
    [0, "the"],
    [1, "</strong>"],
    [0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type "],
    [-1,"and"],
    [1, "test"],
    [0, " scrambled it to make a type"],
    [1, "  added"],
    [0, "</p>"],
    [1, "<ul><li>test</li></ul>"]
];

我正在尝试根据以下条件提取数组组:

以上述数组的一个子数组为例:

[1, "<strong>"],
[0, "the"],
[1, "</strong>"]

这个子数组是一个条件组,条件是a[0] == 1并且a[1]是 HTML 标签的开头。 a[1] 包含<strong> ,它是任何有效 HTML 标签的开头,所以我想推送从开始标签开始到结束标签的元素。

下面是一组:

let group = [
  {
    [1,"<strong>"],
    [0,"the"],
    [1,"</strong>"]
  },
  {
    [1,"<ul><li>test</li></ul>"]
  }
];  

我想根据以下条件提取组:

  1. 元素的第一个索引是 1,即a[i][0] == 1并且a[i][1]是有效 HTML 标签的开始
  2. 元素的第一个索引是 0,即a[i][0] == 0并且它在第 1 步和第 3 步中的规则之前和之后。
  3. 元素的第一个索引是 1,即a[i][0] == 1并且a[i][1]是有效 HTML 标记的结尾。

这整个 3 条规则将包含一个组或一个 JavaScript 对象。

也可能有一种情况,例如:

[1,"<ul><li>test</li></ul>"]

数组项包含整个组<ul><li>test</li></ul> 这也应该包含在最终结果数组中。

编辑


我已经更新了我的方法

 let a = [ [ 0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of " ], [ 1, "<strong>" ], [ 0, "the" ], [ 1, "</strong>" ], [ 0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type " ], [-1, "and" ], [ 1, "test" ], [ 0, " scrambled it to make a type" ], [ 1, " added" ], [ 0, "</p>" ], [ 1, "<ul><li>test</li></ul>" ] ]; checkAndRemoveGroups(a, 1); function checkAndRemoveGroups(arr, group) { let htmlOpenRegex = /<([\\w \\d \\s]+)([^<]+)([^<]+) *[^/?]>/g; let groupArray = new Array(); let depth = 0; //Iterate the array to find out groups and push the items for (let i = 0; i < arr.length; i++) { if (arr[i][0] == group && arr[i][1].match(htmlOpenRegex)) { depth += 1; groupArray.push({ Index: i, Value: arr[i], TagType: "Open" }); } } console.log(groupArray); }

您可以使用数组来打开和关闭标签,如果需要更多标签来关闭顶部标签,请检查它的长度。

 function getTags(string) { var regex = /<(\\/?[^>]+)>/g, m, result = []; while ((m = regex.exec(string)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex.lastIndex++; } result.push(m[1]) } return result; } var array = [[0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of "], [1, "<strong>"], [0, "the"], [1, "</strong>"], [0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type "], [-1, "and"], [1, "test"], [0, " scrambled it to make a type"], [1, " added"], [0, "</p>"], [1, "<ul><li>test</li></ul>"]], result = [], nested = [], tags, i = 0; while (i < array.length) { if (array[i][0] === 1) { tags = getTags(array[i][1]); if (!tags.length) { i++; continue; } result.push([]); // new group found while (i < array.length) { tags.forEach(function (t) { if (t.startsWith('/')) { if (nested[nested.length - 1] === t.slice(1)) { nested.length--; } return; } nested.push(t); }); result[result.length - 1].push(array[i]); if (!nested.length) { break; } i++; tags = getTags(array[i][1]); } } i++; } console.log(result);
 .as-console-wrapper { max-height: 100% !important; top: 0; }

我和斯科特在一起......我认为必须有更好的方法来做你想做的事。 我知道您正试图从这个数组中取出一些东西,但可能有一种完全不同的方法来解决这个问题,即您没有将 html 嵌套在子数组中。

- 已编辑 - 我误解了您要查找的内容,因此我的原始回复实际上并未向您显示出了什么问题,因此我将其删除。 再看看这个。

这正是您想要收到的吗? 如果您根据 html 正则表达式检查每个元素,我不知道您将如何获得[0,"the"] 每个元素都将在其自己的对象中,这似乎不是您想要的。

let group = [
  {
    [1,"<strong>"],
    [0,"the"],
    [1,"</strong>"]
  },
  {
    [1,"<ul><li>test</li></ul>"]
  }
];  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM