简体   繁体   English

使用 JavaScript,如何将 HTML 字符串转换为 HTML 标签和文本内容的数组?

[英]Using JavaScript, how do I transform an HTML string into an array of HTML tags and text content?

I have an HTML string such as:我有一个 HTML 字符串,例如:

<p>
    <strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.
</p>

I want to convert this into a JavaScript array that looks like:我想将其转换为 JavaScript 数组,如下所示:

['<p>', '<strong>', '<em>', 'Lorem Ipsum ', '</em>', '</strong>', 'is simply dummy text of the printing ', '<em>', 'and', '</em>', 'typesetting industry.', '</p>']

Ie it takes the HTML string and breaks it down into an array of tags and HTML content.即它采用 HTML 字符串并将其分解为标签数组和 HTML 内容。

I have tried to use DomParser() as per this question:我试图根据这个问题使用DomParser()

const str = `<p><strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.</p>`;

const doc = new DOMParser().parseFromString(str, 'text/html');
const arr = [...doc.body.childNodes]
  .map(child => child.outerHTML || child.textContent);

However, this simply returns:但是,这只是返回:

['<p><strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.</p>']

I have also tried to search for various Regex based solutions, but haven't been able to find any that break down the string exactly as I require.我还尝试搜索各种基于正则表达式的解决方案,但无法找到任何可以完全按照我的要求分解字符串的解决方案。

Any suggestions?有什么建议么?

Thanks谢谢

I'd make a recursive function to iterate over a given node and return an array of the text representation of its children:我会做一个递归 function 来迭代给定节点并返回其子节点的文本表示数组:

 const str = `<p><strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.</p>`; const doc = new DOMParser().parseFromString(str, 'text/html'); const parseNode = node => { const output = []; for (const child of node.childNodes) { if (child.nodeType === Node.TEXT_NODE) { output.push(child.textContent); } else if (child.nodeType === Node.ELEMENT_NODE) { output.push(`<${child.tagName}>`); output.push(...parseNode(child)); output.push(`</${child.tagName}>`); } } return output; }; console.log(parseNode(doc.body));

If you need to keep attributes too, you could take the outerHTML of the element and take the leading non-brackets:如果您还需要保留属性,则可以采用元素的outerHTML并采用前导非括号:

 const str = `<p style="color:green"><strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.</p>`; const doc = new DOMParser().parseFromString(str, 'text/html'); const parseNode = node => { const output = []; for (const child of node.childNodes) { if (child.nodeType === Node.TEXT_NODE) { output.push(child.textContent); } else if (child.nodeType === Node.ELEMENT_NODE) { const attribs = child.outerHTML.match(/<\s*[^>\s]+([^>]*)/)[1]; output.push(`<${child.tagName}${attribs}>`); output.push(...parseNode(child)); output.push(`</${child.tagName}>`); } } return output; }; console.log(parseNode(doc.body));

If you need self-closing tags not to be expanded, check if the outerHTML of an element contains </ :如果您需要不展开自闭合标签,请检查元素的outerHTML是否包含</

 const str = `<p style="color:green"><input readonly value="x"/><strong><em>Lorem Ipsum </em></strong>is simply dummy text of the printing <em>and</em> typesetting industry.</p>`; const doc = new DOMParser().parseFromString(str, 'text/html'); const parseNode = node => { const output = []; for (const child of node.childNodes) { if (child.nodeType === Node.TEXT_NODE) { output.push(child.textContent); } else if (child.nodeType === Node.ELEMENT_NODE) { const attribs = child.outerHTML.match(/<\s*[^>\s]+([^>]*)/)[1]; output.push(`<${child.tagName}${attribs}>`); if (child.outerHTML.includes('</')) { // Not self closing: output.push(...parseNode(child)); output.push(`</${child.tagName}>`); } } } return output; }; console.log(parseNode(doc.body));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Javascript HTML 字符串转换成标签数组和内部内容 - Javascript HTML string into array of tags and inner content 如何使用 JavaScript 更改多个 html 标签的文本内容? - How to change text content of multiple html tags using JavaScript? 如何使用javascript或angularjs从字符串中提取html标签的内容? - How to extract content of html tags from a string using javascript or angularjs? 如何判断html字符串是否包含内容而不仅仅是标签 - How do I tell if an html string contains content and not just tags 如何在Javascript文本节点内获取HTML标签? - How do I get HTML tags inside of a Javascript Text Node? 如何将HTML字符串拆分为单词和标签的数组 - How do I split a string of HTML into an array of words and tags 使用JavaScript / jQuery,如何删除所选文本的HTML标记? - Using JavaScript / jQuery, how do I remove the HTML tags of selected text? 如何使用 Javascript 在选定文本中获取所有 HTML 选择标签? - How do I get all the HTML selection tags in selected text using Javascript? 如何使用 Javascript 在字符串中找到类似 HTML 的标签? - How can I find HTML like tags in a string using Javascript? 如何使用 JavaScript 将文本字符串截断为以省略号结尾的多个字符以用于段落标签数组 - How do I truncate a string of text to a number of characters ending with ellipsis for an array of paragraph tags using JavaScript
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM