简体   繁体   中英

How can I convert date in a column to a epoch timestamp in bash?

I have a file containing this:

2014-11-10 02:00:03,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
2014-11-10 02:00:07,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
2014-11-10 02:00:11,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22

I want to convert the timestamp to a date in this format:

1415602803,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
1415602807,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
1415602811,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22

How can I achieve this using awk or any other bash tool.

Background: I need to grab files from hdfs for a particular time range. Initially I was using hdfs dfs -stat to bring epoch time and then programming a filter

cat <fileWithFilePaths> | xargs -I ^ -P 15 sh -c 'printf "%s\n" "$(hdfs dfs -stat "%n,%Y,%b" ^)"'>output1

awk 'BEGIN{beginDE=ENVIRON["beginDataEpoch"]; endDE=ENVIRON["endDataEpoch"]; FS="@"} {if ($1 >= beginDE && $1 <= endDE) print $0}' output1

But later realized its using hdfs stat is expensive function. So now I am using hdfs dfs -ls to bring the file time but its in yyyy-mm-dd hh:mm format, thus I am seeking help in regards to change this in epoch time for calculations down the line.

If your log file is large, this GNU awk command will be much faster than a bash loop calling the external command date for each line:

gawk -vFS=, -vOFS=, '{ $1 = mktime(gensub("[-:]", " ", "g", $1)) } 1'
  • The first comma-delimited field is replaced with the result of mktime , a function (a GNU extension) that returns its parameter as a number of seconds since EPOCH.
  • gensub (a GNU extension) is used to turn " YYYY-MM-DD hh:mm:ss " into " YYYY MM DD hh mm ss " which is the format expected by mktime() .
  • The final 1 is a mere TRUE condition that outputs the line.

You can use awk along with its builtin gensub , mktime and sub to accomplish the same thing, eg

$ awk -F, '{ str=$0; sub($1, mktime(gensub(/[-:]/, " ", "g", $1)), str); print str }' file
1415606403,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
1415606407,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
1415606411,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22

Explanation:

  • gensub(/[-:]/, " ", "g", $1) - replace all '-' and ':' in the first field with spaces;
  • mktime(...) - turn the resulting date into a timestamp ;
  • str=$0; sub($1, ..., str); - replace the first field with the timestamp, result in str ; and finally
  • print str - print it.

( note: Your timezone and my are 3-hours different. Your output should give you your desired results on your system. You can also adjust mktime for Daylight Savings (if needed) and you can use the builtin strftime with the utc-flag to adjust for UTC.)

I guess while read loop would be the simplest. The following:

while IFS=, read -r date rest; do
        printf "%s,%s\n" "$(date --date="$date" +%s)" "$rest"
done <<EOF
2014-11-10 02:00:03,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
2014-11-10 02:00:07,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
2014-11-10 02:00:11,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22
EOF

Will output:

1415581203,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
1415581207,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
1415581211,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22

I had the same issue, when I needed to convert the date to unix time in some historical stock data. Used this sed and date command. Please note, that this only works on Linux (GNU implementation of sed and date ).

simon@debian:~/Downloads$ cat inputFile
2014-11-10 02:00:03,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
2014-11-10 02:00:07,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
2014-11-10 02:00:11,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22
simon@debian:~/Downloads$
simon@debian:~/Downloads$
simon@debian:~/Downloads$
simon@debian:~/Downloads$ sed 's/20[0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]/echo `date --date="&" +"%s"`/e' inputFile > output
simon@debian:~/Downloads$
simon@debian:~/Downloads$
simon@debian:~/Downloads$
simon@debian:~/Downloads$ cat output 
1415602803,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
1415602807,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
1415602811,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM