正则表达式替换 html 标签外的文本

Question

I have this HTML:我有这个 HTML：

"This is simple html text <span class='simple'>simple simple text text</span> text"

I need to match only words that are outside any HTML tag.我只需要匹配任何 HTML 标签之外的单词。 I mean if I want to match “simple” and “text” I should get the results only from “This is simple html text” and the last part “text”—the result will be “simple” 1 match, “text” 2 matches.我的意思是如果我想匹配“simple”和“text”，我应该只从“This is simple html text”和最后一部分“text”中得到结果——结果将是“simple” 1 match, “text” 2火柴。 Could anyone help me with this?有人可以帮我解决这个问题吗？ I'm using jQuery.我正在使用 jQuery。

var pattern = new RegExp("(\\b" + value + "\\b)", 'gi');

if (pattern.test(text)) {
    text = text.replace(pattern, "<span class='notranslate'>$1</span>");
}

value is the word I want to match (in this case “simple”) value是我想要匹配的单词（在这种情况下是“简单”）
text is "This is simple html text simple simple text text text" text是"This is simple html text simple simple text text text"

I need to wrap all selected words (in this example it is “simple”) with  .我需要用包裹所有选定的单词（在这个例子中它是“简单的”）。 But I want to wrap only words that are outside any HTML tags.但我只想包装任何HTML 标签之外的单词。 The result of this example should be这个例子的结果应该是

This is <span class='notranslate'>simple</span> html <span class='notranslate'>text</span> <span class='simple'>simple simple text text</span> <span class='notranslate'>text</span>

I do not want replace any text inside我不想替换里面的任何文字

<span class='simple'>simple simple text text</span>

It should be the same as before replacement.应该和更换前一样。

Answer 1

Okay, try using this regex:好的，尝试使用这个正则表达式：

(text|simple)(?![^<]*>|[^<>]*</)

Example worked on regex101 .示例适用于 regex101 。

Breakdown:分解：

(         # Open capture group
  text    # Match 'text'
|         # Or
  simple  # Match 'simple'
)         # End capture group
(?!       # Negative lookahead start (will cause match to fail if contents match)
  [^<]*   # Any number of non-'<' characters
  >       # A > character
|         # Or
  [^<>]*  # Any number of non-'<' and non-'>' characters
  </      # The characters < and /
)         # End negative lookahead.

The negative lookahead will prevent a match if text or simple is between html tags.如果text或simple位于 html 标签之间，则负向前瞻将阻止匹配。

Answer 2

^([^<]*)<\w+.*/\w+>([^<]*)$

However this is a very naive expression.然而，这是一个非常幼稚的表达。 It would be better to use a DOM parser.最好使用 DOM 解析器。

正则表达式替换 html 标签外的文本

问题描述

2 个解决方案

解决方案1
83 已采纳 2013-09-04 19:51:00

解决方案2
1 2013-09-04 18:56:10

正则表达式替换 html 标签外的文本

问题描述

2 个解决方案

解决方案1 83 已采纳 2013-09-04 19:51:00

解决方案2 1 2013-09-04 18:56:10

解决方案1
83 已采纳 2013-09-04 19:51:00

解决方案2
1 2013-09-04 18:56:10