简体   繁体   中英

Change date format with awk

I have a log file, I'm trying to reformat using sed/awk/grep but running into difficulties with the date format. The log looks like this:

1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146

I would like the output as so:

Yealink,1.2.3.4,28-03-2019 11:43:58

I have tried the following:

grep Yealink access.log | grep 404 | sed 's/\[//g' | awk '{print "Yealink,",$1,",",strftime("%Y-%m-%d %H:%M:%S", $4)}' | sed 's/, /,/g' | sed 's/ ,/,/g'

edit - removing [ before passing date string to strftime based on comments - but still not working as expected

However this returns a null date - so clearly I have the strftime syntax wrong:

Yealink,1.2.3.4,1970-01-01 01:00:00

Update 2019-10-25: gawk is now getting strptime() in an extension library, see https://groups.google.com/forum/#!msg/comp.lang.awk/Ft6_h7NEIaE/tmyxd94hEAAJ


Original post: See the gawk manual for strftime, it doesn't expect a time in any format except seconds since the epoch. If gawk had a str p time() THEN that would work, but it doesn't (and I can't persuade the maintainers to provide one ) so you have to massage the timestamp into a format that mktime() can convert to seconds and then pass THAT to strftime(), eg:

$ awk '{
    split($4,t,/[[\/:]/)
    old  = t[4] " " (index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 " " t[2] " " t[5] " " t[6] " " t[7];
    secs = mktime(old)
    new  = strftime("%d-%m-%Y %T",secs);
    print $4 ORS old ORS secs ORS new
}' file
[28/Mar/2019:11:43:58
2019 3 28 11 43 58
1553791438
28-03-2019 11:43:58

but of course you don't need mktime() or strftime() at all - just shuffle the date components around:

$ awk '{
    split($4,t,/[[\/:]/)
    new = sprintf("%02d-%02d-%04d %02d:%02d:%02d",t[2],(index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3,t[4],t[5],t[6],t[7])
    print $4 ORS new
}' file
[28/Mar/2019:11:43:58
28-03-2019 11:43:58

That will work in any awk, not just GNU awk, since it doesn't require time functions.

index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 is just the idiomatic way to convert a 3-char month name abbreviation (eg Mar ) into the equivalent month number ( 3 ).

Another awk, thanks @EdMorton for reviewing the getline usage.

The idea here is to use date command in awk which accepts abbreviated Months

$ date -d"28/Mar/2019:11:43:58 +0000" "+%F %T"  # Fails
date: invalid date ‘28/Mar/2019:11:43:58 +0000’
$ date -d"28 Mar 2019:11:43:58 +0000" "+%F %T"  # Again fails because of : before time section
date: invalid date ‘28 Mar 2019:11:43:58 +0000’
$ date -d"28 Mar 2019 11:43:58 +0000" "+%F %T" # date command works but incorrect time because of + in the zone
2019-03-28 17:13:58
$ date -d"28 Mar 2019 11:43:58" "+%F %T" # correct value after stripping +0000
2019-03-28 11:43:58
$

Results

awk -F"[][]" -v OFS=, '/Yealink/ { 
split($1,a," ");              #Format $1 to get IP
gsub("/", " ",$2); sub(":"," ",$2); sub("\\+[0-9]+","",$2); # Massage to get data value
cmd = "date -d\047" $2 "\047 \047+%F %T\047"; if ( (cmd | getline line) > 0 ) $2=line; close(cmd) # use system date
print "Yealink",a[1],$2 
} ' access.log

Below is the file content

$ cat access.log
1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146
$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM