簡體   English   中英

無法在 JavaScript 數組中提取開始和結束 HTML 標簽組

[英]Not able to extract groups of start and end HTML tags in JavaScript array

我有這個 JavaScript 數組:

let a = [
    [0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of "],
    [1, "<strong>"],
    [0, "the"],
    [1, "</strong>"],
    [0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type "],
    [-1,"and"],
    [1, "test"],
    [0, " scrambled it to make a type"],
    [1, "  added"],
    [0, "</p>"],
    [1, "<ul><li>test</li></ul>"]
];

我正在嘗試根據以下條件提取數組組:

以上述數組的一個子數組為例:

[1, "<strong>"],
[0, "the"],
[1, "</strong>"]

這個子數組是一個條件組,條件是a[0] == 1並且a[1]是 HTML 標簽的開頭。 a[1] 包含<strong> ,它是任何有效 HTML 標簽的開頭,所以我想推送從開始標簽開始到結束標簽的元素。

下面是一組:

let group = [
  {
    [1,"<strong>"],
    [0,"the"],
    [1,"</strong>"]
  },
  {
    [1,"<ul><li>test</li></ul>"]
  }
];  

我想根據以下條件提取組:

  1. 元素的第一個索引是 1,即a[i][0] == 1並且a[i][1]是有效 HTML 標簽的開始
  2. 元素的第一個索引是 0,即a[i][0] == 0並且它在第 1 步和第 3 步中的規則之前和之后。
  3. 元素的第一個索引是 1,即a[i][0] == 1並且a[i][1]是有效 HTML 標記的結尾。

這整個 3 條規則將包含一個組或一個 JavaScript 對象。

也可能有一種情況,例如:

[1,"<ul><li>test</li></ul>"]

數組項包含整個組<ul><li>test</li></ul> 這也應該包含在最終結果數組中。

編輯


我已經更新了我的方法

 let a = [ [ 0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of " ], [ 1, "<strong>" ], [ 0, "the" ], [ 1, "</strong>" ], [ 0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type " ], [-1, "and" ], [ 1, "test" ], [ 0, " scrambled it to make a type" ], [ 1, " added" ], [ 0, "</p>" ], [ 1, "<ul><li>test</li></ul>" ] ]; checkAndRemoveGroups(a, 1); function checkAndRemoveGroups(arr, group) { let htmlOpenRegex = /<([\\w \\d \\s]+)([^<]+)([^<]+) *[^/?]>/g; let groupArray = new Array(); let depth = 0; //Iterate the array to find out groups and push the items for (let i = 0; i < arr.length; i++) { if (arr[i][0] == group && arr[i][1].match(htmlOpenRegex)) { depth += 1; groupArray.push({ Index: i, Value: arr[i], TagType: "Open" }); } } console.log(groupArray); }

您可以使用數組來打開和關閉標簽,如果需要更多標簽來關閉頂部標簽,請檢查它的長度。

 function getTags(string) { var regex = /<(\\/?[^>]+)>/g, m, result = []; while ((m = regex.exec(string)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex.lastIndex++; } result.push(m[1]) } return result; } var array = [[0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of "], [1, "<strong>"], [0, "the"], [1, "</strong>"], [0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type "], [-1, "and"], [1, "test"], [0, " scrambled it to make a type"], [1, " added"], [0, "</p>"], [1, "<ul><li>test</li></ul>"]], result = [], nested = [], tags, i = 0; while (i < array.length) { if (array[i][0] === 1) { tags = getTags(array[i][1]); if (!tags.length) { i++; continue; } result.push([]); // new group found while (i < array.length) { tags.forEach(function (t) { if (t.startsWith('/')) { if (nested[nested.length - 1] === t.slice(1)) { nested.length--; } return; } nested.push(t); }); result[result.length - 1].push(array[i]); if (!nested.length) { break; } i++; tags = getTags(array[i][1]); } } i++; } console.log(result);
 .as-console-wrapper { max-height: 100% !important; top: 0; }

我和斯科特在一起......我認為必須有更好的方法來做你想做的事。 我知道您正試圖從這個數組中取出一些東西,但可能有一種完全不同的方法來解決這個問題,即您沒有將 html 嵌套在子數組中。

- 已編輯 - 我誤解了您要查找的內容,因此我的原始回復實際上並未向您顯示出了什么問題,因此我將其刪除。 再看看這個。

這正是您想要收到的嗎? 如果您根據 html 正則表達式檢查每個元素,我不知道您將如何獲得[0,"the"] 每個元素都將在其自己的對象中,這似乎不是您想要的。

let group = [
  {
    [1,"<strong>"],
    [0,"the"],
    [1,"</strong>"]
  },
  {
    [1,"<ul><li>test</li></ul>"]
  }
];  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM