为什么grep不能使用正则表达式？

Question

I have a regular expression to find functions in files. 我有一个正则表达式来查找文件中的函数。

See how expression perfectly works in PHP 了解表达式如何在PHP中完美地工作

If I try to run same regex with grep from console, I get an error: 如果我尝试从控制台使用grep运行相同的正则表达式，则会收到错误消息：

grep -rP "(_t\s*\(\s*([\'\"])(\d+)\2\s*,\s*([\'\"])(.*?)(?<!\\)\4\s*(?(?=,)[^\)]*\s*\)|\)))" application scripts library public data | sort -n | uniq

grep: unrecognized character after (?<

Looks like grep can't handle this part of regex (?<!\\\\) , which is important for me. 看起来grep无法处理正则表达式(?<!\\\\)这一部分，这对我很重要。

Can anyone advise how to modify regex to make grep work with it? 谁能建议如何修改正则表达式以使其与grep一起使用？

EDIT: String: _t('123', 'pcs.', '', $userLang) . $data['ticker'] . ' (' . $data['security_name'] . ') 编辑：字符串： _t('123', 'pcs.', '', $userLang) . $data['ticker'] . ' (' . $data['security_name'] . ') _t('123', 'pcs.', '', $userLang) . $data['ticker'] . ' (' . $data['security_name'] . ')

Need to find: 需要找到：

index in function ('123') 函数索引（'123'）
text in function ('pcs.') 功能中的文字（“ pcs”）
function itself 功能本身
```
 > _t('123', 'pcs.', '', $userLang) 
```

Answer 1

Doing what I said in the comments solves your problem (using the data from the link): 照我在评论中说的解决您的问题（使用链接中的数据）：

$ cat file
_t('123', 'шт.', '', $userLang)  . $data['ticker'] . ' (' . $data['security_name'] . ')
$ grep -P '(_t\s*\(\s*(['"'"'"])(\d+)\2\s*,\s*(['"'"'"])(.*?)(?<!\\)\4\s*(?(?=,)[^\)]*\s*\)|\)))' file
_t('123', 'шт.', '', $userLang)  . $data['ticker'] . ' (' . $data['security_name'] . ')

The trick here is to use single quotes around the whole regex, then whenever you want a single quote, do '"'"' , which means "close the original string, add a single quote within double quotes, then open a new single-quoted string". 这里的技巧是在整个正则表达式中使用单引号，然后每当需要单引号时，请执行'"'"' ，这意味着“关闭原始字符串，在双引号内添加单引号，然后打开一个新的单引号，带引号的字符串”。 Another alternative, as proposed by glglgl , would be to use '\\'' , ie close the original string, add an escaped ' and open a new string. glglgl提出的另一种选择是使用'\\'' ，即关闭原始字符串，添加一个转义的'并打开一个新字符串。

Using single quotes prevents bash from interpreting the ! 使用单引号可以防止bash解释! as a history expansion. 作为历史扩展。 As gniourf_gniourf mentions above The other option would be to disable that behaviour, using set +o history . 正如gniourf_gniourf上面提到的那样，另一个选择是使用set +o history禁用该行为。

Just as a suggestion, if you're looking to capture separate parts of the regex (and you're already using PCRE mode in grep), you could use Perl instead: 只是建议，如果您希望捕获正则表达式的各个部分（并且您已经在grep中使用PCRE模式），则可以改用Perl：

$ perl -lne '/(_t\s*\(\s*(['\''"])(\d+)\2\s*,\s*(['\''"])(.*?)(?<!\\)\4\s*(?(?=,)[^\)]*\s*\)|\)))/ && print "group 1: $1\ngroup 3: $3\n group 5: $5"' file
group 1: _t('123', 'шт.', '', $userLang)
group 3: 123
group 5: шт.

Answer 2

I strongly recommend to use the tokenizer extension in order to parse PHP files. 我强烈建议使用tokenizer扩展来解析PHP文件。 This is because parsing a programming language requires a stateful parser, a single regex is stateless and therefore cannot provide this. 这是因为解析编程语言需要有状态的解析器，单个正则表达式是无状态的，因此无法提供此功能。

Here comes an example how to extract function names from a PHP source file, tracking function calls is possible as well: 下面是一个示例，该示例说明如何从PHP源文件中提取函数名称，也可以跟踪函数调用：

$source = file_get_contents('some.php');

$tokens = token_get_all($source);
for($i = 0; $i < count($tokens); $i++) {
    $token = $tokens[$i];
    if(!is_string($token)) {
        if($token[0] === T_FUNCTION) {
            // skip whitespace between the keyword 'function' 
            // and the function's name
            $i+=2;
            // Avoid to print the opening brackets of a closure
            if($tokens[$i][0] === T_STRING) {
                echo $tokens[$i][1] . PHP_EOL;
            }
        }
    }   
}

In comments you told that you also want to parse html, js files. 在评论中，您告诉您您还想解析html，js文件。 I recommend a DOM/JS parser for that. 我为此推荐一个DOM / JS解析器。

为什么grep不能使用正则表达式？

问题描述

2 个解决方案

解决方案1
3 2015-02-17 10:10:30

解决方案2
0 2015-02-17 10:11:51

为什么grep不能使用正则表达式？

问题描述

2 个解决方案

解决方案1 3 2015-02-17 10:10:30

解决方案2 0 2015-02-17 10:11:51

解决方案1
3 2015-02-17 10:10:30

解决方案2
0 2015-02-17 10:11:51