简体   繁体   English

PHP/Perl 正则表达式求助!

[英]PHP/Perl Regular expression help!

I have a string:我有一个字符串:

$string = "This is my big <span class="big-string">string</span>";

I cannot figure out how to write a regular expression that will replace the 'b' in 'big' without replacing the 'b' in 'big-string'.我无法弄清楚如何编写一个正则表达式来替换“big”中的“b”而不替换“big-string”中的“b”。 I need to replace all occurances of a substring except when that substring appears in an html tag.我需要替换所有出现的 substring ,除非 substring 出现在 html 标签中。

Any help is appreciated!任何帮助表示赞赏!

Edit编辑

Maybe some more info will help.也许更多信息会有所帮助。 I'm working on an autocomplete feature that highlights whatever you're searching for in the current result set.我正在开发一个自动完成功能,该功能突出显示您在当前结果集中搜索的任何内容。 Currently if you have typed 'aut' in the search dialog, then the results look like this: aut omotive目前,如果您在搜索对话框中输入了“ aut ”,则结果如下所示:auto omotive

The problem appears when I search for 'auto b'.当我搜索“auto b”时出现问题。 First I replace all occurrences of 'auto' with ' <b>auto</b> ' then I replace all occurrences of 'b' with ' <b>b</b> '.首先,我将所有出现的 'auto' 替换为 ' <b>auto</b> ',然后我将所有出现的 'b' 替换为 ' <b>b</b> '。 Unfortunately this second sweep changes ' <b>auto</b> ' to ' <<b>b</b>>auto</<b>b</b>> '不幸的是,第二次扫描将“ <b>auto</b> ”更改为“ <<b>b</b>>auto</<b>b</b>>

Pleasedo not try to parse HTML using regular expressions .不要尝试使用正则表达式解析 HTML Just load up the HTML in a DOM , walk over the text nodes and do a simple str_replace .只需在 DOM 中加载 HTML ,遍历文本节点并执行简单的str_replace You'll thank me around debugging time.你会在调试时间感谢我。

Is there a guarantee that 'big' won't be immediately preceded by " ? If so, then s/([^"])b/$1foo/ should replace the b in question with foo .是否可以保证 'big' 不会紧跟在"之前?如果是这样,那么s/([^"])b/$1foo/应该用foo替换有问题的b

If you insist upon using a regex, this one will do a pretty decent job:如果你坚持使用正则表达式,这个会做得相当不错:

$re = '/# (Crudely) match a sub-string NOT in an HTML tag.
    big        # The sub-string to be matched.
    (?=        # Assert we are not inside an HTML tag.
      [^<>]*   # Consume all non-<> up to...
      (?:<\w+  # either an HTML start tag,
      | $      # or the end of string.
      )        # End group of valid alternatives.
    )          # End "not-in-html-tag" lookahead assertion.
    /ix';

Caveats: This regex has very real limitations.警告:这个正则表达式有非常实际的限制。 The HTML must not have any angle brackets in the tag attributes. HTML 在标签属性中不得有任何尖括号。 This regex also finds the target substring inside other parts of the HTML file such as comments, scripts and stylesheets, and this may not be desirable.此正则表达式还在 HTML 文件的其他部分(例如注释、脚本和样式表)中找到目标 substring,这可能是不可取的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM