简体   繁体   English

在awk中的“ <”和“>”之间打印文本

[英]Print Text Between “<” and “>” in awk

I've got some sample data in the following form and need to extract the email address from it: 我有以下形式的一些示例数据,需要从中提取电子邮件地址:

from=<user@mail.com> (<-- note that this corresponds to $7)
...
...

Currently I'm using this: 目前,我正在使用此:

awk '/from=<.*>/ {print $7}' mail.log

However, that is only finding the strings that match the regex expression. 但是,这只是找到与正则表达式匹配的字符串。

When it comes to printing it out, it still prints out the whole thing (like in the first text box). 当要打印出来时,它仍然会打印出整个内容(就像在第一个文本框中一样)。

You can use gsub to remove everything around < and > : 您可以使用gsub删除<>周围的所有内容:

awk '{gsub(/(^[^<]*<|>.*$)/, "", $7)}1' file

The key point here is (^[^<]*<|>.*$) , a regex that can be split in two blocks --> (A|B) : 这里的关键点是(^[^<]*<|>.*$) ,这是一个可分为两个块的正则表达式-> (A|B)

  • ^[^<]*< everything from the beginning of the field up to < . ^[^<]*<从字段开头到<
  • >.*$ everything from > up to the end of the field. >.*$>到字段末尾的所有内容。

Test 测试

$ cat a
1 2 3 4 5 6 from=<user@mail.com> 8
1 2 3 4 5 6 <user@mail.com> 8
$ awk '{gsub(/(^[^<]*<|>.*$)/, "", $7)}1' a
1 2 3 4 5 6 user@mail.com 8
1 2 3 4 5 6 user@mail.com 8

Warning: I'm told the regular awk command (often found on non-linux systems) doesn't support this command: 警告:有人告诉我常规的awk命令(通常在非Linux系统上发现)不支持此命令:

awk '/from=<([^>]*)>/ { print gensub(/.*from=<([^>]*)>.*/, "\\1", "1");}' mail.log

The core of this is the gensub command. 核心是gensub命令。 Given a regex, it performs a substitution (by default, operating on the whole line, $0 ), and returns the modified string. 给定一个正则表达式,它将执行替换(默认情况下,对整行进行操作, $0 ),并返回修改后的字符串。 The substitute, in this case, is "\\1", which refers to the match group. 在这种情况下,替换项是“ \\ 1”,它表示匹配组。 So we find the whole line (with something special in the middle), then return just the special bit. 因此,我们找到了整行(中间有一些特殊的东西),然后只返回特殊的位。

GNU grep can handle this nicely if you use a positive look behind : 如果您使用积极的眼光, GNU grep可以很好地处理此问题:

$ grep -Po '(?<=from=<)[^>]*' file
user@mail.com

This will print anything between from=< and > in file . 这将在file打印from=<>之间的任何内容。

iiSeymour's answer is the simplest approach in this case, if you have GNU grep (as he states). 如果您有GNU grep(如他所说), iiSeymour的答案是这种情况下最简单的方法。
You could even simplify it a little with \\K (which drops everything matched up to that point): grep -Po 'from=<\\K[^>]*' file . 您甚至可以使用\\K稍微简化一下(这将删除所有匹配的内容): grep -Po 'from=<\\K[^>]*' file

For those NOT using GNU grep (implementations without -P for PCRE (Perl-Compatible Regular Expression) support), you can use the following pipeline, which is not the most efficient, but easy to understand: 对于那些不使用GNU grep(PCRE 不支持 -P实现)的人,可以使用以下管道,该管道不是最有效的,但易于理解:

grep -o 'from=<[^>]*' | cut -d\< -f2
  • -o causes grep to only output the matched part of the input, which includes from=< in this case. -o使grep仅输出输入的匹配部分,在这种情况下,该部分包括from=<
  • The cut command then prints the substring after the < (the second field ( -f2 ) based on delimiter < ( -d\\< ), , effectively printing the email address only. cut然后命令打印的子< (第二场( -f2基于定界符) <-d\\< ),仅有效地印刷的电子邮件地址。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM