简体   繁体   English

如何使用 awk 打印匹配的正则表达式模式?

[英]How to print matched regex pattern using awk?

Using awk , I need to find a word in a file that matches a regex pattern.使用awk ,我需要在与正则表达式模式匹配的文件中找到一个单词。

I only want to print the word matched with the pattern.只想打印与模式匹配的单词。

So if in the line, I have:所以如果在这条线上,我有:

xxx yyy zzz

And pattern:和模式:

/yyy/

I want to only get:我只想得到:

yyy

EDIT: thanks to kurumi i managed to write something like this:编辑:感谢kurumi我设法写了这样的东西:

awk '{
        for(i=1; i<=NF; i++) {
                tmp=match($i, /[0-9]..?.?[^A-Za-z0-9]/)
                if(tmp) {
                        print $i
                }
        }
}' $1

and this is what i needed :) thanks a lot!这就是我需要的 :) 非常感谢!

This is the very basic这是非常基本的

awk '/pattern/{ print $0 }' file

ask awk to search for pattern using // , then print out the line, which by default is called a record, denoted by $0.awk使用//搜索pattern ,然后打印出该行,默认情况下称为记录,用 $0 表示。 At least read up the documentation .至少阅读文档

If you only want to get print out the matched word.如果您只想打印出匹配的单词。

awk '{for(i=1;i<=NF;i++){ if($i=="yyy"){print $i} } }' file

It sounds like you are trying to emulate GNU's grep -o behaviour.听起来您正在尝试模仿 GNU 的grep -o行为。 This will do that providing you only want the first match on each line:这将做到这一点,只要您只需要每行的第一个匹配项:

awk 'match($0, /regex/) {
    print substr($0, RSTART, RLENGTH)
}
' file

Here's an example, using GNU's awk implementation ( ):这是一个使用 GNU 的awk实现 ( ) 的示例:

awk 'match($0, /a.t/) {
    print substr($0, RSTART, RLENGTH)
}
' /usr/share/dict/words | head
act
act
act
act
aft
ant
apt
art
art
art

Read about match , substr , RSTART and RLENGTH in the awk manual.awk手册中阅读matchsubstrRSTARTRLENGTH

After that you may wish to extend this to deal with multiple matches on the same line.之后,您可能希望扩展它以处理同一行上的多个匹配项。

gawk can get the matching part of every line using this as action: gawk可以使用此操作获取每一行的匹配部分:

{ if (match($0,/your regexp/,m)) print m[0] }

match(string, regexp [, array]) If array is present, it is cleared, and then the zeroth element of array is set to the entire portion of string matched by regexp. match(string, regexp [, array]) 如果array 存在,则将其清除,然后将array 的第0 个元素设置为regexp 匹配的字符串的整个部分。 If regexp contains parentheses, the integer-indexed elements of array are set to contain the portion of string matching the corresponding parenthesized subexpression.如果 regexp 包含括号,则数组的整数索引元素被设置为包含匹配相应括号子表达式的字符串部分。 http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions

If you are only interested in the last line of input and you expect to find only one match (for example a part of the summary line of a shell command), you can also try this very compact code, adopted from How to print regexp matches using `awk`?如果您只对输入的最后一行感兴趣,并且希望只找到一个匹配项(例如 shell 命令的摘要行的一部分),您还可以尝试使用这个非常紧凑的代码,从How to print regexp matchings 中采用使用`awk`? :

$ echo "xxx yyy zzz" | awk '{match($0,"yyy",a)}END{print a[0]}'
yyy

Or the more complex version with a partial result:或者具有部分结果的更复杂的版本:

$ echo "xxx=a yyy=b zzz=c" | awk '{match($0,"yyy=([^ ]+)",a)}END{print a[1]}'
b

Warning: the awk match() function with three arguments only exists in gawk , not in mawk警告:带有三个参数的awk match()函数只存在于gawk ,而不存在于mawk

Here is another nice solution using a lookbehind regex in grep instead of awk .这是在grep而不是awk使用后视正则表达式的另一个不错的解决方案。 This solution has lower requirements to your installation:此解决方案对您的安装要求较低:

$ echo "xxx=a yyy=b zzz=c" | grep -Po '(?<=yyy=)[^ ]+'
b

If Perl is an option, you can try this:如果 Perl 是一个选项,你可以试试这个:

perl -lne 'print $1 if /(regex)/' file

To implement case-insensitive matching, add the i modifier要实现不区分大小写的匹配,请添加i修饰符

perl -lne 'print $1 if /(regex)/i' file

To print everything AFTER the match:要在比赛后打印所有内容:

perl -lne 'if ($found){print} else{if (/regex(.*)/){print $1; $found++}}' textfile

To print the match and everything after the match:要打印比赛和比赛后的所有内容:

perl -lne 'if ($found){print} else{if (/(regex.*)/){print $1; $found++}}' textfile

题外话,这也可以使用 grep 来完成,如果有人正在寻找 grep 解决方案,只需将其张贴在这里

echo 'xxx yyy zzze ' | grep -oE 'yyy'

Using sed can also be elegant in this situation.在这种情况下,使用 sed 也很优雅。 Example (replace line with matched group "yyy" from line):示例(用来自行的匹配组“yyy”替换行):

$ cat testfile
xxx yyy zzz
yyy xxx zzz
$ cat testfile | sed -r 's#^.*(yyy).*$#\1#g'
yyy
yyy

Relevant manual page: https://www.gnu.org/software/sed/manual/sed.html#Back_002dreferences-and-Subexpressions相关手册页: https : //www.gnu.org/software/sed/manual/sed.html#Back_002dreferences-and-Subexpressions

If you know what column the text/pattern you're looking for (eg "yyy") is in, you can just check that specific column to see if it matches, and print it.如果您知道要查找的文本/模式(例如“yyy”)在哪一列,您只需检查该特定列以查看它是否匹配,然后打印它。

For example, given a file with the following contents, (called asdf.txt )例如,给定一个包含以下内容的文件,(称为asdf.txt

xxx yyy zzz

to only print the second column if it matches the pattern "yyy", you could do something like this:如果第二列与模式“yyy”匹配,则仅打印第二列,您可以执行以下操作:

awk '$2 ~ /yyy/ {print $2}' asdf.txt

Note that this will also match basically any line where the second column has a "yyy" in it, like these:请注意,这也将基本上匹配第二列中有“yyy”的任何行,如下所示:

xxx yyyz zzz
xxx zyyyz

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM