[英]Match specific length words, anchored, without doing magic math
Let's say I wanted to find all 12-letter words in /usr/share/dict/words
that started with c
and ended with er
. 假设我想在
/usr/share/dict/words
中找到以c
开头并以er
结尾的所有12个字母的单词。 Off the top of my head, a workable pattern could look something like: 在我的头顶,一个可行的模式可能看起来像:
grep -E '^c.{9}er$' /usr/share/dict/words
It finds: 它发现:
cabinetmaker
calcographer
calligrapher
campanologer
campylometer
...
But that .{9}
bothers me. 但那
.{9}
困扰我。 It feels too magical , subtracting the total length of all the anchor characters from the number defined in the original constraint. 感觉太神奇了 ,从原始约束中定义的数字减去所有锚字符的总长度。
Is there any way to rewrite this regex so it doesn't require doing this calculation up front, allowing a literal 12
to be used directly in the pattern? 有没有办法重写这个正则表达式,所以它不需要预先进行这个计算,允许直接在模式中使用文字
12
?
You can use the -x
option which selects only matches that exactly match the whole line. 您可以使用
-x
选项,该选项仅选择与整行完全匹配的匹配项。
grep -xE '.{12}' | grep 'c.*er'
Or use the -P
option which clarifies the pattern as a Perl regular expression and use a lookahead assertion. 或者使用
-P
选项将模式阐明为Perl正则表达式并使用前瞻断言。
grep -P '^(?=.{12}$)c.*er$'
您可以使用awk
作为替代方案并避免此计算:
awk -v len=12 'length($1)==len && $1 ~ /^c.*?er$/' file
I don't know grep
so well, but some more advanced NFA RegEx implementations provide you with lookaheads and lookbehinds. 我不太了解
grep
,但是一些更高级的NFA RegEx实现为您提供了前瞻和外观。 If you can figure out any means to make those available for you, you could write: 如果你能找到任何方法让你可以使用,你可以写:
^(?=c).{12}(?<=er)$
Maybe as a perl
one-liner like this? 也许像这样的
perl
?
cat /usr/share/dict/words | perl -ne "print if m/^(?=c).{12}(?<=er)$/"
One approach with GNU sed
: GNU sed
一种方法:
$ sed -nr '/^.{12}$/{/^c.*er$/p}' words
With BSD sed
(Mac OS) it would be: 使用
BSD sed
(Mac OS)它将是:
$ sed -nE '/^.{12}$/{/^c.*er$/p;}' words
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.