简体   繁体   English

引号之间的Grep模式

[英]Grep pattern between quotes

I'm trying to grep a code base to find alpha numeric codes between quotes. 我正在尝试grep代码库以找到引号之间的字母数字代码。 So, for example my code base might contain the line 因此,例如我的代码库可能包含以下行

some stuff "A234DG3" maybe more stuff

And I'd like to output: A234DG3 我想输出:A234DG3

I'm lucky in that I know my string is 7 long and only integers and the letters AZ, az. 我很幸运,因为我知道我的字符串是7个长且只有整数和字母AZ,az。

After a bit of playing I've come up with the following, but it's just not coming out with what I'd like 玩了一段时间之后,我想出了以下几点,但并没有得出我想要的结果

grep -ro '".*"' . | grep [A-Za-z0-9]{7} | less

Where am I going wrong here? 我在哪里错了? It feels like grep should give me what I want, but am I better off using something else? 感觉grep应该给我我想要的东西,但是我最好还是使用其他东西吗? Cheers! 干杯!

Using basic or extended POSIX regular expressions there is no way to extract the value between the quotes with grep . 使用基本或扩展的POSIX正则表达式无法用grep提取引号之间的值。 Since that I would use sed for a portable solution: 从那以后,我将使用sed作为便携式解决方案:

sed -n 's/.*\"\([^"]\+\)".*/\1/p' <<< 'some stuff "A234DG3" maybe more stuff'

However, having GNU goodies, GNU grep will support PCRE expressions with the -P command line option. 但是,有了GNU好东西,GNU grep将使用-P命令行选项支持PCRE表达式。 You can use this: 您可以使用此:

grep -oP '.*?"\K[^"]+(?=")' <<< 'some stuff "A234DG3" maybe more stuff'

.*" matches everything until the first quote - including it. The \\K option clears the matching buffer and therefore works like a handy, dynamic lookbehind assertion. (I could have used a real lookbehind but I like \\K ). [^"]+ matches the text between the quotes. .*"匹配所有内容,直到包含第一个引号为止。 \\K选项清除匹配的缓冲区,因此像方便,动态的lookbehind断言一样工作。(我本可以使用真正的 lookbehind,但是我喜欢\\K )。 [^"]+匹配引号之间的文本。 (?=") is a lookahead assertion the ensure after the match will follow a " - without including it into the match. (?=")是一个先行断言,确保比赛之后的保证将遵循" --而不包括在比赛中。

The problem is that an RE is pretty much required to match the longest sequence it can. 问题在于,非常需要RE才能匹配最长的序列。 So, given something like: 因此,给定类似:

a "bcd" efg "hij" klm "nop" q

A pattern of ".*" should match: "bcd" efg "hij" klm "nop" (everything from the first quote to the last quote), not just "bcd" . 模式".*"应该匹配: "bcd" efg "hij" klm "nop" (从第一个引用到最后一个引用的所有内容),而不仅仅是"bcd"

You probably want a pattern more like "[^"]*" to match the open-quote, an arbitrary number of other things, then a close quote. 您可能想要一个更像"[^"]*"来匹配开引号,任意数量的其他东西,然后匹配一个右引号。

So after more playing about I've come up with this which gives me what I'm after: 因此,经过更多的讨论之后,我想出了这个,它可以为我提供以下帮助:

grep -r -E -o '"[A-Za-z0-9]{7}"' . | less

With the -E allowing the use of the {7} length matcher 使用-E允许使用{7}长度匹配器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM