简体   繁体   English

Unix中的时间戳数据操作

[英]timestamp data manipulation in unix

I have a csv data file that has two timestamp fields - start_time and end_time. 我有一个包含两个时间戳字段的csv数据文件-start_time和end_time。 They are strings in the form of "2014-02-01 00:06:22" . 它们是形式为"2014-02-01 00:06:22"字符串。 Each line of the data file is a record with multiple fields. 数据文件的每一行都是具有多个字段的记录。 The file is pretty small. 该文件很小。

I want to calculate the average duration among all records. 我想计算所有记录之间的平均持续时间。 Other than using shell scripts, is there any one-liner command that I could use for this kind of simple calculation, possibly using awk? 除了使用shell脚本之外,是否可以使用awk进行这种简单的计算使用任何单线命令?

I'm very new to awk. 我对awk非常陌生。 Here's what I have but does not work. 这是我所拥有的但不起作用。 $6 and $7 are fields for start_time and end_time. $6$7是start_time和end_time的字段。

awk -F, 'BEGIN { count=0 total=0 }
    { sec1=date +%s -d $6 sec2=date +%s -d $7
    total+=sec2-sec1 count++} 
    END {print "avg trip time: ", total/count}' dataset.csv

Sample of the csv file: csv文件示例:

"start_time","stop_time","start station name","end station name","bike_id"
"2014-02-01 00:00:00","2014-02-01 00:06:22","Washington Square E","Stanton St & Chrystie St","21101"

Using GNU awk for mktime() and gensub(): 对mktime()和gensub()使用GNU awk:

$ cat tst.awk
BEGIN { FS="^\"|\",\"" }
function t2s(time) { return mktime(gensub(/[-:]/," ","g",time)) }
NR>1 { totDurs += (t2s($3) - t2s($2)) }
END { print totDurs / (NR-1) }

$ gawk -f tst.awk file
382

with other awks you need to call the shell date function: 与其他awk,您需要调用shell date函数:

$ cat tst2.awk
BEGIN { FS="^\"|\",\"" }
function t2s(time,      cmd,secs) {
    cmd = "date +%s -d \"" time "\""
    if ( (cmd | getline secs) <= 0 ) {
        secs = -1
    }
    close(cmd)
    return secs
}
NR>1 { totDurs += (t2s($3) - t2s($2)) }
END { print totDurs / (NR-1) }

$ awk -f tst2.awk file                               
382

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM