[英]How to print matched regex pattern using awk?
Using awk
, I need to find a word in a file that matches a regex pattern.使用
awk
,我需要在与正则表达式模式匹配的文件中找到一个单词。
I only want to print the word matched with the pattern.我只想打印与模式匹配的单词。
So if in the line, I have:所以如果在这条线上,我有:
xxx yyy zzz
And pattern:和模式:
/yyy/
I want to only get:我只想得到:
yyy
EDIT: thanks to kurumi i managed to write something like this:编辑:感谢kurumi我设法写了这样的东西:
awk '{
for(i=1; i<=NF; i++) {
tmp=match($i, /[0-9]..?.?[^A-Za-z0-9]/)
if(tmp) {
print $i
}
}
}' $1
and this is what i needed :) thanks a lot!这就是我需要的 :) 非常感谢!
This is the very basic这是非常基本的
awk '/pattern/{ print $0 }' file
ask awk
to search for pattern
using //
, then print out the line, which by default is called a record, denoted by $0.让
awk
使用//
搜索pattern
,然后打印出该行,默认情况下称为记录,用 $0 表示。 At least read up the documentation .至少阅读文档。
If you only want to get print out the matched word.如果您只想打印出匹配的单词。
awk '{for(i=1;i<=NF;i++){ if($i=="yyy"){print $i} } }' file
It sounds like you are trying to emulate GNU's grep -o
behaviour.听起来您正在尝试模仿 GNU 的
grep -o
行为。 This will do that providing you only want the first match on each line:这将做到这一点,只要您只需要每行的第一个匹配项:
awk 'match($0, /regex/) {
print substr($0, RSTART, RLENGTH)
}
' file
Here's an example, using GNU's awk
implementation ( gawk ):这是一个使用 GNU 的
awk
实现 ( gawk ) 的示例:
awk 'match($0, /a.t/) {
print substr($0, RSTART, RLENGTH)
}
' /usr/share/dict/words | head
act
act
act
act
aft
ant
apt
art
art
art
Read about match
, substr
, RSTART
and RLENGTH
in the awk
manual.在
awk
手册中阅读match
、 substr
、 RSTART
和RLENGTH
。
After that you may wish to extend this to deal with multiple matches on the same line.之后,您可能希望扩展它以处理同一行上的多个匹配项。
gawk can get the matching part of every line using this as action: gawk可以使用此操作获取每一行的匹配部分:
{ if (match($0,/your regexp/,m)) print m[0] }
match(string, regexp [, array]) If array is present, it is cleared, and then the zeroth element of array is set to the entire portion of string matched by regexp.
match(string, regexp [, array]) 如果array 存在,则将其清除,然后将array 的第0 个元素设置为regexp 匹配的字符串的整个部分。 If regexp contains parentheses, the integer-indexed elements of array are set to contain the portion of string matching the corresponding parenthesized subexpression.
如果 regexp 包含括号,则数组的整数索引元素被设置为包含匹配相应括号子表达式的字符串部分。 http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions
http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions
If you are only interested in the last line of input and you expect to find only one match (for example a part of the summary line of a shell command), you can also try this very compact code, adopted from How to print regexp matches using `awk`?如果您只对输入的最后一行感兴趣,并且希望只找到一个匹配项(例如 shell 命令的摘要行的一部分),您还可以尝试使用这个非常紧凑的代码,从How to print regexp matchings 中采用使用`awk`? :
:
$ echo "xxx yyy zzz" | awk '{match($0,"yyy",a)}END{print a[0]}'
yyy
Or the more complex version with a partial result:或者具有部分结果的更复杂的版本:
$ echo "xxx=a yyy=b zzz=c" | awk '{match($0,"yyy=([^ ]+)",a)}END{print a[1]}'
b
Warning: the awk
match()
function with three arguments only exists in gawk
, not in mawk
警告:带有三个参数的
awk
match()
函数只存在于gawk
,而不存在于mawk
Here is another nice solution using a lookbehind regex in grep
instead of awk
.这是在
grep
而不是awk
使用后视正则表达式的另一个不错的解决方案。 This solution has lower requirements to your installation:此解决方案对您的安装要求较低:
$ echo "xxx=a yyy=b zzz=c" | grep -Po '(?<=yyy=)[^ ]+'
b
If Perl is an option, you can try this:如果 Perl 是一个选项,你可以试试这个:
perl -lne 'print $1 if /(regex)/' file
To implement case-insensitive matching, add the i
modifier要实现不区分大小写的匹配,请添加
i
修饰符
perl -lne 'print $1 if /(regex)/i' file
To print everything AFTER the match:要在比赛后打印所有内容:
perl -lne 'if ($found){print} else{if (/regex(.*)/){print $1; $found++}}' textfile
To print the match and everything after the match:要打印比赛和比赛后的所有内容:
perl -lne 'if ($found){print} else{if (/(regex.*)/){print $1; $found++}}' textfile
题外话,这也可以使用 grep 来完成,如果有人正在寻找 grep 解决方案,只需将其张贴在这里
echo 'xxx yyy zzze ' | grep -oE 'yyy'
Using sed can also be elegant in this situation.在这种情况下,使用 sed 也很优雅。 Example (replace line with matched group "yyy" from line):
示例(用来自行的匹配组“yyy”替换行):
$ cat testfile
xxx yyy zzz
yyy xxx zzz
$ cat testfile | sed -r 's#^.*(yyy).*$#\1#g'
yyy
yyy
Relevant manual page: https://www.gnu.org/software/sed/manual/sed.html#Back_002dreferences-and-Subexpressions相关手册页: https : //www.gnu.org/software/sed/manual/sed.html#Back_002dreferences-and-Subexpressions
If you know what column the text/pattern you're looking for (eg "yyy") is in, you can just check that specific column to see if it matches, and print it.如果您知道要查找的文本/模式(例如“yyy”)在哪一列,您只需检查该特定列以查看它是否匹配,然后打印它。
For example, given a file with the following contents, (called asdf.txt )例如,给定一个包含以下内容的文件,(称为asdf.txt )
xxx yyy zzz
to only print the second column if it matches the pattern "yyy", you could do something like this:如果第二列与模式“yyy”匹配,则仅打印第二列,您可以执行以下操作:
awk '$2 ~ /yyy/ {print $2}' asdf.txt
Note that this will also match basically any line where the second column has a "yyy" in it, like these:请注意,这也将基本上匹配第二列中有“yyy”的任何行,如下所示:
xxx yyyz zzz
xxx zyyyz
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.