I have to use regex with sed or awk to find things in a log file. The log file like this.
Jan 16 08:33:18 mail.knurledwidgets.example.org sendmail[1618]: qhgKT0cN80gSX: to=<user1@company.example.com>, delay=00:00:02, xdelay=00:00:01, mailer=esmtp, pri=193069, relay=mx.company.example.com. [192.168.123.12], dsn=2.0.0, stat=Sent (OK <sp4jffaeid3FxjPGr@mx.company.example.com>)
Jan 16 08:33:04 mail.knurledwidgets.example.org sendmail[3539]: q5c1SrFqkAZq9b: Milter: connect to filters
Jan 16 08:33:06 mail.knurledwidgets.example.org sendmail[3539]: q5c1SrFqkAZq9b: from=<user1@dont-cross-the-memes.example.com>, size=38065260, class=-30, nrcpts=1, msgid=<gnDSaYSEaP4Yk/.F0EhYbIYcihGO8Vd.dont-cross-the-memes.example.com>, proto=ESMTP, daemon=MTA-v6, relay=proton.dont-cross-the-memes.example.com [192.168.98.234]
Those are three main form in the log file. Since I have to find the mail received which means the email which has a "from" before the email. I have write a regex like this.
^Jan\s\d\d\s(\d\d).*\bfrom\b\=<(.*)>,\s\bsize\b.*
I have test this regex using the TextWrangler. It can find all the email and replace them to "hour" "email address".
However when I trying to using this regex in the sed or awk to write a script. I have a few problem about my code.
This is Sed:
#!/bin/bash
sed -E 's/^Jan\s\d\d\s(\d\d).*\bfrom\b\=<(.*)>,\s\bsize\b.*/\1 \2/g' output
I don't know why this code doesn't work. It doesn't replace anything. How do I fix this problem? Maybe awk is a better choice?
I usually find it convenient when parsing input with name=value data to create an array that lets me simply access the values by their names, eg:
$ cat tst.awk
{
delete n2v
for (i=1; i<=NF; i++) {
if ($i ~ /=/) {
name = value = $i
sub(/=.*/,"",name)
sub(/[^=]+=/,"",value)
gsub(/^<|[>,]+$/,"",value)
n2v[name] = value
}
}
for (name in n2v) {
value = n2v[name]
print ">", name, "=", value
}
print "-----"
}
"from" in n2v { print $1, $2, $3, n2v["from"] }
.
$ awk -f tst.awk file
> stat = Sent
> relay = mx.company.example.com.
> xdelay = 00:00:01
> to = user1@company.example.com
> dsn = 2.0.0
> mailer = esmtp
> delay = 00:00:02
> pri = 193069
-----
-----
> from = user1@dont-cross-the-memes.example.com
> relay = proton.dont-cross-the-memes.example.com
> nrcpts = 1
> class = -30
> size = 38065260
> proto = ESMTP
> msgid = gnDSaYSEaP4Yk/.F0EhYbIYcihGO8Vd.dont-cross-the-memes.example.com
> daemon = MTA-v6
-----
Jan 16 08:33:06 user1@dont-cross-the-memes.example.com
您也可以使用awk(假设可以在“ from = <”上进行匹配,并且字段的顺序相同)
awk -F'[ :<>,]' '/ from=</ {print $3 " " $12}' output
I think the problem is with \\d
syntax. It does not mean what you think. In sed
it is followed by decimal values that matches a character, so it causes your regex to fail. Replace them with [0-9]
, like:
sed -r 's/^Jan\s[0-9]{2}\s([0-9]{2}).*\bfrom\b=<(.*)>,\s\bsize\b.*/\1 \2/g' output
Note that I use -r
switch, because I don't know what -E
means.
For the unique line that matches (the third one), yields:
08 user1@dont-cross-the-memes.example.com
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.