使用正则表达式从 html 字符串值中删除 html 属性

Question

我需要从 html 字符串中删除 html 属性。 我有一些格式化的文本输入字段，允许用户复制和粘贴文本，同时保留基本的 html。 问题是一些从 word 文档中复制的文本带有需要删除的属性。 目前，我正在使用的正则表达式在正则表达式测试器中工作，但没有任何属性被删除。

删除属性的代码：

 var stringhtml = '<div class="Paragraph BCX0 SCXW244271589" paraid="1364880375" paraeid="{8e523337-60c9-4b0d-8c73-fb1a70a2ba58}{165}" style="margin-bottom: 0px;margin-left:96px;padding:0px;user-select:text;-webkit-user-drag:none;-webkit-tap-highlight-color:transparent; overflow-wrap: break-word;">some text</div>' var regex = /[a-zA-Z]*=".*?"/; var replacedstring = stringhtml.replace(regex, ''); document.write(replacedstring);

任何帮助表示赞赏！

Answer 1

关于为什么使用正则表达式解析 HTML 可能相当冒险，有很多文献 - 这个著名的 StackOverflow 问题就是一个很好的例子。

正如@Polymer 指出的那样，您当前的正则表达式将丢失带有单引号的属性，但也有其他可能性： data属性——例如data-id="233"将丢失，以及非引号属性，如disabled 。 可能还有更多！

您最终可能会一直赶上这种方法，当您在 HTML 中遇到新组合时，总是不得不更改您的正则表达式。

一种更安全的方法可能是使用DOMParser方法将您的字符串解析为 HTML，并以这种方式从中提取内容：

 let stringhtml = '<div class="Paragraph BCX0 SCXW244271589" paraid="1364880375" paraeid="{8e523337-60c9-4b0d-8c73-fb1a70a2ba58}{165}" style="margin-bottom: 0px;margin-left:96px;padding:0px;user-select:text;-webkit-user-drag:none;-webkit-tap-highlight-color:transparent; overflow-wrap: break-word;">some text</div>' let parser = new DOMParser(); let parsedResult = parser.parseFromString(stringhtml, 'text/html'); let element = document.createElement(parsedResult.body.firstChild.tagName); element.innerText = parsedResult.documentElement.textContent; console.log(element);

使用正则表达式从 html 字符串值中删除 html 属性

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-10-01 05:03:34

使用正则表达式从 html 字符串值中删除 html 属性

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-10-01 05:03:34

解决方案1
0 已采纳 2021-10-01 05:03:34