簡體   English   中英

使用 JavaScript,如何將 HTML 字符串轉換為 HTML 標簽和文本內容的數組?

[英]Using JavaScript, how do I transform an HTML string into an array of HTML tags and text content?

我有一個 HTML 字符串,例如:

<p>
    <strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.
</p>

我想將其轉換為 JavaScript 數組,如下所示:

['<p>', '<strong>', '<em>', 'Lorem Ipsum ', '</em>', '</strong>', 'is simply dummy text of the printing ', '<em>', 'and', '</em>', 'typesetting industry.', '</p>']

即它采用 HTML 字符串並將其分解為標簽數組和 HTML 內容。

我試圖根據這個問題使用DomParser()

const str = `<p><strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.</p>`;

const doc = new DOMParser().parseFromString(str, 'text/html');
const arr = [...doc.body.childNodes]
  .map(child => child.outerHTML || child.textContent);

但是,這只是返回:

['<p><strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.</p>']

我還嘗試搜索各種基於正則表達式的解決方案,但無法找到任何可以完全按照我的要求分解字符串的解決方案。

有什么建議么?

謝謝

我會做一個遞歸 function 來迭代給定節點並返回其子節點的文本表示數組:

 const str = `<p><strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.</p>`; const doc = new DOMParser().parseFromString(str, 'text/html'); const parseNode = node => { const output = []; for (const child of node.childNodes) { if (child.nodeType === Node.TEXT_NODE) { output.push(child.textContent); } else if (child.nodeType === Node.ELEMENT_NODE) { output.push(`<${child.tagName}>`); output.push(...parseNode(child)); output.push(`</${child.tagName}>`); } } return output; }; console.log(parseNode(doc.body));

如果您還需要保留屬性,則可以采用元素的outerHTML並采用前導非括號:

 const str = `<p style="color:green"><strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.</p>`; const doc = new DOMParser().parseFromString(str, 'text/html'); const parseNode = node => { const output = []; for (const child of node.childNodes) { if (child.nodeType === Node.TEXT_NODE) { output.push(child.textContent); } else if (child.nodeType === Node.ELEMENT_NODE) { const attribs = child.outerHTML.match(/<\s*[^>\s]+([^>]*)/)[1]; output.push(`<${child.tagName}${attribs}>`); output.push(...parseNode(child)); output.push(`</${child.tagName}>`); } } return output; }; console.log(parseNode(doc.body));

如果您需要不展開自閉合標簽,請檢查元素的outerHTML是否包含</

 const str = `<p style="color:green"><input readonly value="x"/><strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.</p>`; const doc = new DOMParser().parseFromString(str, 'text/html'); const parseNode = node => { const output = []; for (const child of node.childNodes) { if (child.nodeType === Node.TEXT_NODE) { output.push(child.textContent); } else if (child.nodeType === Node.ELEMENT_NODE) { const attribs = child.outerHTML.match(/<\s*[^>\s]+([^>]*)/)[1]; output.push(`<${child.tagName}${attribs}>`); if (child.outerHTML.includes('</')) { // Not self closing: output.push(...parseNode(child)); output.push(`</${child.tagName}>`); } } } return output; }; console.log(parseNode(doc.body));

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM