简体   繁体   中英

Not able to extract groups of start and end HTML tags in JavaScript array

I have this JavaScript array:

let a = [
    [0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of "],
    [1, "<strong>"],
    [0, "the"],
    [1, "</strong>"],
    [0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type "],
    [-1,"and"],
    [1, "test"],
    [0, " scrambled it to make a type"],
    [1, "  added"],
    [0, "</p>"],
    [1, "<ul><li>test</li></ul>"]
];

I am trying to extract groups of the array based on the following condition:

Take a subarray of the above array as an example:

[1, "<strong>"],
[0, "the"],
[1, "</strong>"]

This sub-array is a group on the condition that a[0] == 1 and a[1] is the beginning of a HTML tag. a[1] contains <strong> which is the beginning of any valid HTML tag, so I want to push the elements beginning at the start tag and till the end tag.

Like the following is one group:

let group = [
  {
    [1,"<strong>"],
    [0,"the"],
    [1,"</strong>"]
  },
  {
    [1,"<ul><li>test</li></ul>"]
  }
];  

I want to extract the groups based on the following condition that:

  1. The first index of an element is 1, that is a[i][0] == 1 and a[i][1] is the beginning of a valid HTML tag
  2. The first index of an element is 0, that is a[i][0] == 0 and that it is preceded and succeeded by the the rules in Step 1 and 3.
  3. The first index of an element is 1, that is a[i][0] == 1 and a[i][1] is the end of a valid HTML tag.

These entire 3 rules will comprise a group or a JavaScript object.

There can be also one scenario like:

[1,"<ul><li>test</li></ul>"]

The array item contains the entire group <ul><li>test</li></ul> . That should also be included in the final result array.

Edit


I have updated my approach

 let a = [ [ 0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of " ], [ 1, "<strong>" ], [ 0, "the" ], [ 1, "</strong>" ], [ 0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type " ], [-1, "and" ], [ 1, "test" ], [ 0, " scrambled it to make a type" ], [ 1, " added" ], [ 0, "</p>" ], [ 1, "<ul><li>test</li></ul>" ] ]; checkAndRemoveGroups(a, 1); function checkAndRemoveGroups(arr, group) { let htmlOpenRegex = /<([\\w \\d \\s]+)([^<]+)([^<]+) *[^/?]>/g; let groupArray = new Array(); let depth = 0; //Iterate the array to find out groups and push the items for (let i = 0; i < arr.length; i++) { if (arr[i][0] == group && arr[i][1].match(htmlOpenRegex)) { depth += 1; groupArray.push({ Index: i, Value: arr[i], TagType: "Open" }); } } console.log(groupArray); }

You could use an array for opening and closing tags and check the length of it if some more tags are required to close the top tag.

 function getTags(string) { var regex = /<(\\/?[^>]+)>/g, m, result = []; while ((m = regex.exec(string)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex.lastIndex++; } result.push(m[1]) } return result; } var array = [[0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of "], [1, "<strong>"], [0, "the"], [1, "</strong>"], [0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type "], [-1, "and"], [1, "test"], [0, " scrambled it to make a type"], [1, " added"], [0, "</p>"], [1, "<ul><li>test</li></ul>"]], result = [], nested = [], tags, i = 0; while (i < array.length) { if (array[i][0] === 1) { tags = getTags(array[i][1]); if (!tags.length) { i++; continue; } result.push([]); // new group found while (i < array.length) { tags.forEach(function (t) { if (t.startsWith('/')) { if (nested[nested.length - 1] === t.slice(1)) { nested.length--; } return; } nested.push(t); }); result[result.length - 1].push(array[i]); if (!nested.length) { break; } i++; tags = getTags(array[i][1]); } } i++; } console.log(result);
 .as-console-wrapper { max-height: 100% !important; top: 0; }

I'm with Scott... I think there must be a better way of doing what you want to do. I understand that you're trying to get things out of this array, but there's probably an entirely different approach to this problem where you don't have html nested inside of sub-arrays.

-- Edited - I misunderstood what you were looking for, so my original response didn't actually show you what was going wrong and I removed it. Looking some more at this.

Is this exactly what you want to receive? I don't see how you're ever going to get [0,"the"] if you're checking every element against the html regex. And every element is going to be in its own object, which doesn't seem to be what you want.

let group = [
  {
    [1,"<strong>"],
    [0,"the"],
    [1,"</strong>"]
  },
  {
    [1,"<ul><li>test</li></ul>"]
  }
];  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM