在正则表达式中查找并替换注释块中的字符串（的一部分）

Question

I'm trying to find a certain string that can occur inside a comment block. 我试图找到可以在注释块内出现的特定字符串。 That string can be a word, but it can also be part of a word. 该字符串可以是一个单词，但也可以是单词的一部分。 For instance, suppose I'm looking for the word "codex", then this word should be replace with "bindex" but even when it's part of a word, like "codexing". 例如，假设我正在寻找单词“ codex”，那么该单词应替换为“ bindex”，即使它是单词的一部分，例如“ codexing”。 This should be changed to "bindexing". 应将其更改为“ bindexing”。

The trick is, that this should only happen when this word is inside a comment block. 诀窍是，仅当此单词在注释块内时才应发生这种情况。

/* Lorem ipsum dolor sit amet, codex consectetur adipiscing elit. */

This word --> codex should not be replaced

/* Lorem ipsum dolor sit 
 * amet, codex consectetur 
 * adipiscing elit. 
 */

/** Lorem ipsum dolor sit 
 * amet, codex consectetur 
 * adipiscing elit. 
 */

// Lorem ipsum dolor sit amet, codex consectetur adipiscing elit.

# Lorem ipsum dolor sit amet, codex consectetur adipiscing elit.

------------------- Below "codex" is part of a word -------------------

/* Lorem ipsum dolor sit amet, somecodex consectetur adipiscing elit. */

/* Lorem ipsum dolor sit 
 * amet, codexing consectetur 
 * adipiscing elit. 
 */

And here also, this word --> codex should not be replaced

/** Lorem ipsum dolor sit 
 * amet, testcodexing consectetur 
 * adipiscing elit. 
 */

// Lorem ipsum dolor sit amet, __codex consectetur adipiscing elit.

# Lorem ipsum dolor sit amet, codex__ consectetur adipiscing elit.

What I have so far is this code: 到目前为止，我的代码是：

$text = preg_replace ( '~(\/\/|#|\/\*).*?(codex).*?~', '$1 bindex', $text);

As you can see in this example , this isn't really working the way I'd like. 正如您在本示例中看到的那样，这实际上并不是我想要的那样。 It doesn't replace the word when it's inside a multiline /* */ comment block, And sometimes it removes all the text that was in front of the word "codex" as well. 当它位于多行/* */注释块中时，它不会替换该单词，有时还会删除单词“ codex”前面的所有文本。

How can I improve my regex so that it meets my requirements? 如何改善我的正则表达式，使其符合我的要求？

Answer 1

Since you're dealing with multi-line text here you should be using s modifier (DOTALL) to match text across multiple line. 由于您要在此处处理多行文本，因此应使用s修饰符（DOTALL）来匹配多行文本。 Also forward slash doesn't need to be escaped. 同样，正斜杠不需要转义。

Try this code: 试试这个代码：

$text = preg_replace ( '~(//|#|/\*).*?(codex).*?~s', '$1 bindex', $text );

Answer 2

$text = preg_replace ( '~(//|#|/\*)(.*?)(codex).*?~s', '$1$2bindex', $text );

这不会删除'codex'之前的评论，例如来自anubhava的答复

Answer 3

This version can deal with any type of comments and will not fail with this kind of strings /**/ codex /**/ or /*xxxx codex codex xxxx*/ : 此版本可以处理任何类型的注释，并且不会因此类字符串/**/ codex /**/或/*xxxx codex codex xxxx*/而失败：

$pattern = <<<'LOD'
~
# definitions
(?(DEFINE)
    (?<cl> (?> [^c\n]++ | c(?!odex) )++            )
    (?<c>  (?> [^*c]++ | \*++(?!/) | c(?!odex) )++ )
)

# pattern
(?|
    (?> (?>//|\#) \g<cl>*+ | \G(?<!^) \g<cl>?+ ) \K codex (\g<cl>*+)
  |
    (?> /\* \n*+ | \G(?<!^) (?!\n) ) \g<c>*+ \K codex (\n*+) 
)  
~x
LOD;
$replacement ="bindex$3";
$result = preg_replace($pattern, $replacement, $subject);

Answer 4

Something like this using sub groups should work; 像这样的使用子组的东西应该起作用；

$str = preg_replace(
    '~(<!--[a-zA-Z0-9 \n]*)(MYWORD)([a-zA-Z0-9 \n]*-->)~s',
    '$1$3',
     $input
);

You will just need to create a separate rule for each type of comment, and limit the possible characters allowed inside the comment with a character class (You might prefer to use a negated character class). 您只需要为每种类型的注释创建一个单独的规则，并使用字符类限制注释中允许的可能字符（您可能更喜欢使用否定的字符类）。

Answer 5

As was written hundreds, thousands or maybe even millions of times before in different comments, Regular Expressions are NOT for parsing code, or searching for errors in one. 正如之前在不同注释中被写成数百，数千甚至什至数百万次一样，正则表达式不适用于解析代码或在其中查找错误。

Consider these examples: 考虑以下示例：

// code to be replaced
var a = "/*code to be replaced*/";

/* code to be replaced
var b = "*/code to be replaced"; */

There is no way for you to parse the code (and yes, finding out if a string is inside a comment block is called parsing) with REGEX. 您无法使用REGEX解析代码（是的，找出字符串是否在注释块中称为解析）。

Find a parser library, or create a diminished one of your own. 查找解析器库，或创建一个自己的精简库。 If you do create one, remember all the different use-cases of the script, and in particular, how strings will affect your code. 如果确实要创建一个脚本，请记住该脚本的所有不同用例，尤其要注意字符串将如何影响您的代码。

在正则表达式中查找并替换注释块中的字符串（的一部分）

问题描述

5 个解决方案

解决方案1
3 已采纳 2013-08-05 20:01:44

解决方案2
2 2013-08-05 20:22:35

解决方案3
1 2013-08-05 21:40:07

解决方案4
0 2013-08-05 19:57:14

解决方案5
0 2013-08-05 20:00:09

在正则表达式中查找并替换注释块中的字符串（的一部分）

问题描述

5 个解决方案

解决方案1 3 已采纳 2013-08-05 20:01:44

解决方案2 2 2013-08-05 20:22:35

解决方案3 1 2013-08-05 21:40:07

解决方案4 0 2013-08-05 19:57:14

解决方案5 0 2013-08-05 20:00:09

解决方案1
3 已采纳 2013-08-05 20:01:44

解决方案2
2 2013-08-05 20:22:35

解决方案3
1 2013-08-05 21:40:07

解决方案4
0 2013-08-05 19:57:14

解决方案5
0 2013-08-05 20:00:09