Print Text Between “<” and “>” in awk

Question

I've got some sample data in the following form and need to extract the email address from it:

from=<user@mail.com> (<-- note that this corresponds to $7)
...
...

Currently I'm using this:

awk '/from=<.*>/ {print $7}' mail.log

However, that is only finding the strings that match the regex expression.

When it comes to printing it out, it still prints out the whole thing (like in the first text box).

Answer 1

You can use gsub to remove everything around < and > :

awk '{gsub(/(^[^<]*<|>.*$)/, "", $7)}1' file

The key point here is (^[^<]*<|>.*$) , a regex that can be split in two blocks --> (A|B) :

^[^<]*< everything from the beginning of the field up to < .
>.*$ everything from > up to the end of the field.

Test

$ cat a
1 2 3 4 5 6 from=<user@mail.com> 8
1 2 3 4 5 6 <user@mail.com> 8
$ awk '{gsub(/(^[^<]*<|>.*$)/, "", $7)}1' a
1 2 3 4 5 6 user@mail.com 8
1 2 3 4 5 6 user@mail.com 8

Answer 2

Warning: I'm told the regular awk command (often found on non-linux systems) doesn't support this command:

awk '/from=<([^>]*)>/ { print gensub(/.*from=<([^>]*)>.*/, "\\1", "1");}' mail.log

The core of this is the gensub command. Given a regex, it performs a substitution (by default, operating on the whole line, $0 ), and returns the modified string. The substitute, in this case, is "\\1", which refers to the match group. So we find the whole line (with something special in the middle), then return just the special bit.

Answer 3

GNU grep can handle this nicely if you use a positive look behind :

$ grep -Po '(?<=from=<)[^>]*' file
user@mail.com

This will print anything between from=< and > in file .

Answer 4

iiSeymour's answer is the simplest approach in this case, if you have GNU grep (as he states).
You could even simplify it a little with \\K (which drops everything matched up to that point): grep -Po 'from=<\\K[^>]*' file .

For those NOT using GNU grep (implementations without -P for PCRE (Perl-Compatible Regular Expression) support), you can use the following pipeline, which is not the most efficient, but easy to understand:

grep -o 'from=<[^>]*' | cut -d\< -f2

-o causes grep to only output the matched part of the input, which includes from=< in this case.
The cut command then prints the substring after the < (the second field ( -f2 ) based on delimiter < ( -d\\< ), , effectively printing the email address only.

Print Text Between “<” and “>” in awk

Question

4 answers

solution1
4 ACCPTED 2015-03-03 11:19:11

Test

solution2
1 2015-03-03 11:25:12

solution3
1 2015-03-03 11:29:43

solution4
1 2015-03-03 15:03:33

Print Text Between “<” and “>” in awk

Question

4 answers

solution1 4 ACCPTED 2015-03-03 11:19:11

Test

solution2 1 2015-03-03 11:25:12

solution3 1 2015-03-03 11:29:43

solution4 1 2015-03-03 15:03:33

solution1
4 ACCPTED 2015-03-03 11:19:11

solution2
1 2015-03-03 11:25:12

solution3
1 2015-03-03 11:29:43

solution4
1 2015-03-03 15:03:33