[英]How can I get my regular expression to return only the first match on the line?
My data contains lines like this: 我的数据包含以下行:
55 511 00,"805, 809, 810, 839, 840",J223,201,338,116,16,200,115,6,P,S,"8,5","25,74",47,242,"55,7"
I have tried ,"(.*)",
as a regular expression, but it captures too much of the line. 我已经尝试过将
,"(.*)",
作为正则表达式使用,但是它捕获了太多的行。 This expression currently returns: 该表达式当前返回:
,"805, 809, 810, 839, 840",J223,201,338,116,16,200,115,6,P,S,"8,5","25,74",
but what I really want is just the first quoted string. 但是我真正想要的只是第一个带引号的字符串。 Valid results would be:
有效结果将是:
,"805, 809, 810, 839, 840",
805, 809, 810, 839, 840
How can I capture only that first match? 我怎样才能只捕获第一场比赛?
You need to make the *
lazy instead of greedy : 您需要使
*
懒而不是贪婪 :
,"(.*?)",
or match all characters but "
: 或匹配除
"
:之外的所有字符
,"[^"]*",
Try "([^"]+)
. 尝试
"([^"]+)
。 Group 1 will match 805, 809, 810, 839, 840
第一组将匹配
805, 809, 810, 839, 840
/"([^"]+)"/
Will do the job! 会做的工作! Everything between the "-s
“ -s”之间的所有内容
Your regex is greedy, the .* will get everything up until the final " 您的正则表达式是贪婪的, 。*将会使所有内容都保留下来,直到最后一个“
So to make it non-greedy, add a ? 因此,要使其不贪心,请添加? at the end of the bracketed part:
在方括号部分的末尾:
,"(.*?)",
Which should stop it as soon as it reaches the next " 当到达下一个“
There are many ways to handle this, but the simplest and most generic is to use a non-greedy match if your regular expression engine supports it. 有很多方法可以解决此问题,但是最简单,最通用的方法是在正则表达式引擎支持的情况下使用非贪婪匹配。 If it does not, you have to build an expression that knows a lot more about the structure of your data.
如果不是,则必须构建一个对数据结构了解更多的表达式。
Here's an example using Perl-compatible regular expressions to split the output: 这是一个使用与Perl兼容的正则表达式拆分输出的示例:
$ pcregrep -o '"(.*?)"' /tmp/foo | head -n1
"805, 809, 810, 839, 840"
Here's another example that uses pure Perl: 这是另一个使用纯Perl的示例:
$ perl -ne 'print "$1\n" if /(".*?")/' /tmp/foo
"805, 809, 810, 839, 840"
Here's a third example that uses POSIX extended regular expressions, but which does not support non-greedy matches. 这是第三个示例,该示例使用POSIX扩展正则表达式,但不支持非贪婪匹配。
$ egrep -o '("[^"]+")' /tmp/foo | head -n1
"805, 809, 810, 839, 840"
You may also want to consider splitting your input into fields, and then testing each field until you find a match. 您可能还需要考虑将输入分成多个字段,然后测试每个字段,直到找到匹配项。 A lot just depends on what facilities you have at your disposal.
很大程度上取决于您拥有什么设施。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.