简体   繁体   English

无法在 JavaScript 数组中提取开始和结束 HTML 标签组

[英]Not able to extract groups of start and end HTML tags in JavaScript array

I have this JavaScript array:我有这个 JavaScript 数组:

let a = [
    [0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of "],
    [1, "<strong>"],
    [0, "the"],
    [1, "</strong>"],
    [0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type "],
    [-1,"and"],
    [1, "test"],
    [0, " scrambled it to make a type"],
    [1, "  added"],
    [0, "</p>"],
    [1, "<ul><li>test</li></ul>"]
];

I am trying to extract groups of the array based on the following condition:我正在尝试根据以下条件提取数组组:

Take a subarray of the above array as an example:以上述数组的一个子数组为例:

[1, "<strong>"],
[0, "the"],
[1, "</strong>"]

This sub-array is a group on the condition that a[0] == 1 and a[1] is the beginning of a HTML tag.这个子数组是一个条件组,条件是a[0] == 1并且a[1]是 HTML 标签的开头。 a[1] contains <strong> which is the beginning of any valid HTML tag, so I want to push the elements beginning at the start tag and till the end tag. a[1] 包含<strong> ,它是任何有效 HTML 标签的开头,所以我想推送从开始标签开始到结束标签的元素。

Like the following is one group:下面是一组:

let group = [
  {
    [1,"<strong>"],
    [0,"the"],
    [1,"</strong>"]
  },
  {
    [1,"<ul><li>test</li></ul>"]
  }
];  

I want to extract the groups based on the following condition that:我想根据以下条件提取组:

  1. The first index of an element is 1, that is a[i][0] == 1 and a[i][1] is the beginning of a valid HTML tag元素的第一个索引是 1,即a[i][0] == 1并且a[i][1]是有效 HTML 标签的开始
  2. The first index of an element is 0, that is a[i][0] == 0 and that it is preceded and succeeded by the the rules in Step 1 and 3.元素的第一个索引是 0,即a[i][0] == 0并且它在第 1 步和第 3 步中的规则之前和之后。
  3. The first index of an element is 1, that is a[i][0] == 1 and a[i][1] is the end of a valid HTML tag.元素的第一个索引是 1,即a[i][0] == 1并且a[i][1]是有效 HTML 标记的结尾。

These entire 3 rules will comprise a group or a JavaScript object.这整个 3 条规则将包含一个组或一个 JavaScript 对象。

There can be also one scenario like:也可能有一种情况,例如:

[1,"<ul><li>test</li></ul>"]

The array item contains the entire group <ul><li>test</li></ul> .数组项包含整个组<ul><li>test</li></ul> That should also be included in the final result array.这也应该包含在最终结果数组中。

Edit编辑


I have updated my approach 我已经更新了我的方法

 let a = [ [ 0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of " ], [ 1, "<strong>" ], [ 0, "the" ], [ 1, "</strong>" ], [ 0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type " ], [-1, "and" ], [ 1, "test" ], [ 0, " scrambled it to make a type" ], [ 1, " added" ], [ 0, "</p>" ], [ 1, "<ul><li>test</li></ul>" ] ]; checkAndRemoveGroups(a, 1); function checkAndRemoveGroups(arr, group) { let htmlOpenRegex = /<([\\w \\d \\s]+)([^<]+)([^<]+) *[^/?]>/g; let groupArray = new Array(); let depth = 0; //Iterate the array to find out groups and push the items for (let i = 0; i < arr.length; i++) { if (arr[i][0] == group && arr[i][1].match(htmlOpenRegex)) { depth += 1; groupArray.push({ Index: i, Value: arr[i], TagType: "Open" }); } } console.log(groupArray); }

You could use an array for opening and closing tags and check the length of it if some more tags are required to close the top tag.您可以使用数组来打开和关闭标签,如果需要更多标签来关闭顶部标签,请检查它的长度。

 function getTags(string) { var regex = /<(\\/?[^>]+)>/g, m, result = []; while ((m = regex.exec(string)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex.lastIndex++; } result.push(m[1]) } return result; } var array = [[0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of "], [1, "<strong>"], [0, "the"], [1, "</strong>"], [0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type "], [-1, "and"], [1, "test"], [0, " scrambled it to make a type"], [1, " added"], [0, "</p>"], [1, "<ul><li>test</li></ul>"]], result = [], nested = [], tags, i = 0; while (i < array.length) { if (array[i][0] === 1) { tags = getTags(array[i][1]); if (!tags.length) { i++; continue; } result.push([]); // new group found while (i < array.length) { tags.forEach(function (t) { if (t.startsWith('/')) { if (nested[nested.length - 1] === t.slice(1)) { nested.length--; } return; } nested.push(t); }); result[result.length - 1].push(array[i]); if (!nested.length) { break; } i++; tags = getTags(array[i][1]); } } i++; } console.log(result);
 .as-console-wrapper { max-height: 100% !important; top: 0; }

I'm with Scott... I think there must be a better way of doing what you want to do.我和斯科特在一起......我认为必须有更好的方法来做你想做的事。 I understand that you're trying to get things out of this array, but there's probably an entirely different approach to this problem where you don't have html nested inside of sub-arrays.我知道您正试图从这个数组中取出一些东西,但可能有一种完全不同的方法来解决这个问题,即您没有将 html 嵌套在子数组中。

-- Edited - I misunderstood what you were looking for, so my original response didn't actually show you what was going wrong and I removed it. - 已编辑 - 我误解了您要查找的内容,因此我的原始回复实际上并未向您显示出了什么问题,因此我将其删除。 Looking some more at this.再看看这个。

Is this exactly what you want to receive?这正是您想要收到的吗? I don't see how you're ever going to get [0,"the"] if you're checking every element against the html regex.如果您根据 html 正则表达式检查每个元素,我不知道您将如何获得[0,"the"] And every element is going to be in its own object, which doesn't seem to be what you want.每个元素都将在其自己的对象中,这似乎不是您想要的。

let group = [
  {
    [1,"<strong>"],
    [0,"the"],
    [1,"</strong>"]
  },
  {
    [1,"<ul><li>test</li></ul>"]
  }
];  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM