简体   繁体   中英

Print Text Between “<” and “>” in awk

I've got some sample data in the following form and need to extract the email address from it:

from=<user@mail.com> (<-- note that this corresponds to $7)
...
...

Currently I'm using this:

awk '/from=<.*>/ {print $7}' mail.log

However, that is only finding the strings that match the regex expression.

When it comes to printing it out, it still prints out the whole thing (like in the first text box).

You can use gsub to remove everything around < and > :

awk '{gsub(/(^[^<]*<|>.*$)/, "", $7)}1' file

The key point here is (^[^<]*<|>.*$) , a regex that can be split in two blocks --> (A|B) :

  • ^[^<]*< everything from the beginning of the field up to < .
  • >.*$ everything from > up to the end of the field.

Test

$ cat a
1 2 3 4 5 6 from=<user@mail.com> 8
1 2 3 4 5 6 <user@mail.com> 8
$ awk '{gsub(/(^[^<]*<|>.*$)/, "", $7)}1' a
1 2 3 4 5 6 user@mail.com 8
1 2 3 4 5 6 user@mail.com 8

Warning: I'm told the regular awk command (often found on non-linux systems) doesn't support this command:

awk '/from=<([^>]*)>/ { print gensub(/.*from=<([^>]*)>.*/, "\\1", "1");}' mail.log

The core of this is the gensub command. Given a regex, it performs a substitution (by default, operating on the whole line, $0 ), and returns the modified string. The substitute, in this case, is "\\1", which refers to the match group. So we find the whole line (with something special in the middle), then return just the special bit.

GNU grep can handle this nicely if you use a positive look behind :

$ grep -Po '(?<=from=<)[^>]*' file
user@mail.com

This will print anything between from=< and > in file .

iiSeymour's answer is the simplest approach in this case, if you have GNU grep (as he states).
You could even simplify it a little with \\K (which drops everything matched up to that point): grep -Po 'from=<\\K[^>]*' file .

For those NOT using GNU grep (implementations without -P for PCRE (Perl-Compatible Regular Expression) support), you can use the following pipeline, which is not the most efficient, but easy to understand:

grep -o 'from=<[^>]*' | cut -d\< -f2
  • -o causes grep to only output the matched part of the input, which includes from=< in this case.
  • The cut command then prints the substring after the < (the second field ( -f2 ) based on delimiter < ( -d\\< ), , effectively printing the email address only.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM