简体   繁体   English

从字符串中删除未引用的属性

[英]Remove unquoted attribute from string

I have an issue parsing the dom elements when text contains something like below. 当文本包含如下内容时,解析dom元素时出现问题。 I wanted to remove highligted text from actual using Javascript. 我想从使用Javascript的实际内容中删除高亮文本。 Can you please help me on this. 你能帮我这个忙吗? I want to depend on regular expressions on the same. 我想依靠相同的正则表达式。

I know how to get the quoted attributes using standard string functions and also using dom parser. 我知道如何使用标准字符串函数以及dom解析器来获取带引号的属性。

For the nodes like below, using string functions such as replace, slice may work but I need to traverse thru entire string. 对于如下所示的节点,使用诸如replace,slice之类的字符串函数可能有效,但我需要遍历整个字符串。 Which is performance issue. 这是性能问题。

So I wanted to go with regular expressions to find such attributes in a node. 因此,我想使用正则表达式在节点中查找此类属性。

  <p class=MsoListParagraphCxSpFirst style='text-indent:-.25in;mso-list:l0 level1 lfo1'> 

In the above example I want to remove class attribute and class name could be anything. 在上面的示例中,我想删除类属性,并且类名称可以是任何东西。 These nodes are generated from MS word and are not in my control. 这些节点是从MS Word生成的,不在我的控制范围内。

EDIT: Following is the pattern I am using to search unquoted text. 编辑:以下是我用来搜索未加引号的文本的模式。 But it is not working 但这不起作用

var pattern = /<p class=\s*=\s*([^" >]+)/im

Regex101 Example Regex101示例

Regex: 正则表达式:
\\S+?=[^'"]\\S*[^'"\\s]

the tricky part with this one is finding the end of the unquoted attribute, in this example i'm assuming it will not contain any white space characters, so I can use the first occurrence of white space to terminate the match 这个中最棘手的部分是找到unquoted属性的结尾,在此示例中,我假设它将不包含任何空格字符,因此我可以使用第一次出现的空格来终止匹配

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM