简体   繁体   English

使用带有或不带有引号的grep regex模式之间的区别?

[英]Difference between using grep regex pattern with or without quotes?

I'm learning from Linux Academy and the tutorial shows how to use grep and regex. 我正在从Linux Academy学习,该教程显示了如何使用grep和regex。

He is putting his regex pattern in between quotes something like this: 他将正则表达式模式放在引号之间,如下所示:

grep 'pattern' file.txt

This seems to be the same than doing it without quotes: 这似乎与不带引号的情况相同:

grep pattern file.txt 

But when he does something like this, he needs to escape the { and }: 但是当他做这样的事情时,他需要转义{和}:

grep '^A\{1,4\}' file.txt 

And after doing some testing these scape characters don't seem to be needed when writing the pattern without the quotes. 在进行了一些测试之后,编写没有引号的模式时似乎不需要这些换码符。

grep ^A{1,4} file.txt

So what is the difference between these two methods? 那么这两种方法有什么区别? Are the quotations necessary? 是否需要报价? Why in the first case the escape characters are needed? 为什么在第一种情况下需要转义字符?

Lastly, I've also seen other methods like grep -E and egrep, which is the most common method that people use to grep with regex? 最后,我还看到了其他方法,例如grep -E和egrep,这是人们使用正则表达式进行grep的最常用方法吗?

Edit: Thanks for the reminder that the pattern goes before the file. 编辑:感谢您提醒该模式在文件之前。

Many thanks! 非常感谢!

You can sometimes get away with omitting quotes, but it's safest not to. 有时您可以省略引号,但这是最安全的做法。 This is because the syntax of regular expressions overlaps that of filename wildcard patterns, and when the shell sees something that looks like a wildcard pattern (and it isn't in quotes), the shell will try to "expand" it into a list of matching filenames. 这是因为正则表达式的语法与文件名通配符模式的语法重叠,并且当外壳程序看到类似通配符模式的内容(并且不在引号中)时,外壳程序将尝试将其“扩展”为匹配的文件名。 If there are no matching files, it gets passed through unchanged, but if there are matches it gets replaced with the matching filenames. 如果没有匹配的文件,它将原样传递,但是如果有匹配的文件,它将被匹配的文件名替换。

Here's a simple example. 这是一个简单的例子。 Suppose we're trying to search file.txt for an "a" followed optionally by some "b"s, and print only the matches. 假设我们正在尝试在file.txt中搜索“ a”,然后可选地加上一些“ b”,并仅打印匹配项。 So you run: 因此,您运行:

grep -o ab* file.txt

Now, " ab* could be interpreted as a wildcard pattern looking for files that start with "ab", and the shell will interpret it that way. If there are no files in the current directory that start with "ab", this won't cause a problem. But suppose there are two, "abcd.txt" and "abcdef.jpg". Then the shell expands this to the equivalent of: 现在,“ ab*可以解释为通配符模式,查找以“ ab”开头的文件,shell 以这种方式解释它。如果当前目录中没有以“ ab”开头的文件,则不会不会引起问题,但是假设有两个“ abcd.txt”和“ abcdef.jpg”,然后shell将其扩展为以下内容:

grep -o abcd.txt abcdef.jpg file.txt

...and then grep will search the files abcdef.jpg and file.txt for the regex pattern abcd.txt . ...然后grep将在文件abcdef.jpg和file.txt中搜索正则表达式模式abcd.txt

So, basically, using an unquoted regex pattern might work, but is not safe . 因此,基本上,使用不带引号的正则表达式模式可能有效,但并不安全 So don't do it. 所以不要这样做。

BTW, I'd also recommend using single-quotes instead of double-quotes, because there are some regex characters that're treated specially by the shell even when they're in double-quotes (mostly dollar sign and backslash/escape). 顺便说一句,我也建议您使用单引号而不是双引号,因为即使某些正则表达式字符在双引号中(多数为美元符号和反斜杠/转义符),它们也会被外壳程序专门处理。 Again, they'll often get passed through unchanged, but not always, and unless you understand the (somewhat messy) parsing rules, you might get unexpected results. 同样,它们通常会通过不变的方式,但并非总是如此,除非您了解(有些混乱)解析规则,否则可能会得到意外的结果。

BTW^2, for similar reasons you should (almost) always put double-quotes around variable references (eg grep -O 'ab* "$filename" instead of grep -O 'ab*' $filename ). BTW ^ 2,出于类似的原因,您应该(几乎)始终在变量引用周围加上双引号(例如grep -O 'ab* "$filename"而不是grep -O 'ab*' $filename )。 Single-quotes don't allow variable references at all; 单引号根本不允许变量引用。 unquoted variable references are subject to word splitting and wildcard expansion, both of which can cause trouble. 不带引号的变量引用会受到单词拆分和通配符扩展的影响,这两者都可能引起麻烦。 Double-quoted variables get expanded and nothing else . 用双引号括起来的变量得到扩展,而没有别的

BTW^3, there are a bunch of variants of regular expression syntax. 顺便说一句^ 3,有很多正则表达式语法的变体。 The reason the curly braces in your example expression need to be escaped is that, by default, grep uses POSIX "basic" regular expression syntax ("BRE") . 您需要避免在示例表达式中使用花括号的原因是,默认情况下, grep使用POSIX“基本”正则表达式语法(“ BRE”) In BRE syntax, some regex special characters (including curly brackets and parentheses) must be escaped to have their special meaning (and some others, like alternation with | , are just not available at all). 在BRE语法中,某些正则表达式特殊字符(包括大括号和括号)必须转义以具有其特殊含义(而其他一些字符,如与|交替显示则完全不可用)。 grep -E , on the other hand, uses "extended" regular expression syntax ("ERE"), in which those characters have their special meanings unless they're escaped. 另一方面, grep -E使用“扩展的”正则表达式语法(“ ERE”),其中这些字符具有特殊的含义,除非对其进行转义。

And then there's the Perl-compatible syntax (PCRE), and many other variants. 然后是Perl兼容语法(PCRE)和许多其他变体。 Using the wrong variant of the syntax is a common cause of trouble with regular expressions (eg using perl extensions in an ERE context, as here and here ). 使用语法的错误变体是导致正则表达式出现问题的常见原因(例如,在ERE上下文中使用perl扩展名,如此此处 )。 It's important to know which variant the tool you're using understands, and write your regex to that syntax. 重要的是要知道您使用的工具可以理解哪种变体,并将正则表达式编写为该语法。

Here's a simple example: "a", followed by 1 to 3 space-like characters, followed by "b", in various regex syntax variants: 这是一个简单的示例:“ a”,后跟1到3个类似空格的字符,后跟“ b”,这是各种正则表达式语法的变体:

a[[:space:]]\{1,3\}b    # BRE syntax
a[[:space:]]{1,3}b      # ERE syntax
a\s{1,3}b               # PCRE syntax

Just to make things more complicated, some tools will nominally accept one syntax, but also allow some extensions from other syntax variants. 为了使事情变得更复杂,某些工具名义上将接受一种语法,但也允许其他语法变体的某些扩展。 In the example above, you can see that perl added the shorthand \\s for a space-like character, which is not part of either POSIX standard syntax. 在上面的示例中,您可以看到perl为空格字符添加了简写\\s ,这不是POSIX标准语法的一部分。 But in fact many tools that nominally use BRE or ERE will actually accept the \\s shorthand. 但是实际上,许多名义上使用BRE或ERE的工具实际上都会接受\\s简写。

Actually, there are two completely unrelated aspects of escaping in your question. 实际上,在您的问题中转义有两个完全不相关的方面。 The first has to do how to represent strings in bash. 第一个必须要做的是如何在bash中表示字符串。 This is about readability, which usually means personal taste. 这与可读性有关,通常意味着个人品味。 For example, I don't like escaping, hence I prefer writing ab\\ cd as 'ab cd' . 例如,我不喜欢转义,因此我更喜欢将ab\\ cd写为'ab cd' Hence, I would write 因此,我会写

echo 'ab cd'
grep -F 'ab cd' myfile.txt

instead of 代替

echo ab\ cd
grep -F ab\ cd myfile.txt

but there is nothing wrong with either one, and you can choose whichever looks simpler to you. 但任何一个都没有错,您可以选择对您来说更简单的一个。

The other aspect indeed is related to grep , at least as long as you do not use the -F option in grep , which always interprets the search argument literally. 其他方面确实给相关grep ,至少只要你不使用-Fgrep的选项,它总是解释搜索参数字面上。 In this case, the shell is not involved, and the question is whether a certain character is interpreted as a regexp character or as a literal. 在这种情况下,不涉及外壳程序,问题是某个字符是解释为正则表达式字符还是文字。 Gordon Davisson has already explained this in detail, so I give only an example which combines both aspects: 戈登·戴维森(Gordon Davisson)已经详细解释了这一点,因此我仅给出一个结合了两个方面的示例:

Say you want to grep for a space, followed by one or more periods, followed by another space. 假设您要grep一个空格,然后是一个或多个句点,然后是另一个空格。 You can't write this as 你不能这样写

grep -E  .+  myfile.txt

because the spaces would be eaten by bash and the . 因为这些空间会被bash和the . would have special meaning to grep . grep有特殊的意义。 Hence, you have to choose some escape mechanism. 因此,您必须选择一些转义机制。 My personal style would be 我的个人风格是

grep -E ' [.]+ ' myfile.txt

but many people dislike the [.] and prefer \\. 但是许多人不喜欢[.] ,而是喜欢\\. instead. 代替。 This would then become 这将成为

grep -E ' \.+ ' myfile.txt

This still uses quotes to salvage the spaces from the shell, but escapes the period for grep. 它仍然使用引号从外壳中抢救空格,但转义了grep的时间。 If you prefer to use no quotes at all, you can write 如果您不想使用任何引号,可以编写

grep -E \ \\.+\  myfile.txt

Note that you need to prefix the \\ which is intended for grep by another \\ , because the backslash has, like a space, a special meaning for the shell, and if you would not write \\\\. 请注意,您需要前缀\\其意在通过另一个grep, \\ ,因为反斜杠,就像一个空间,为外壳特殊的意义,如果你不写\\\\. , grep would not see a backslash-period, but just a period. ,grep不会看到反斜杠,而只是一个时期。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM