[英]Microsecond difference between two datetime.time columns in python pandas?
I have a python pandas data frame, which contains 2 columns: time1
and time2
: 我有一个python pandas数据框,其中包含2列:
time1
和time2
:
time1 time2
13:00:07.294234 13:00:07.294234
14:00:07.294234 14:00:07.394234
15:00:07.294234 15:00:07.494234
16:00:07.294234 16:00:07.694234
How can I generate a third column which contains the microsecond difference between time1
and time2
, in integer if possible? 如何生成第三列,其中包含
time1
和time2
之间的微秒差异,如果可能的话,是整数?
If you prepend hese with an actual date you can convert them to datetime64 columns: 如果您在实际日期之前添加hese,则可以将它们转换为datetime64列:
In [11]: '2014-03-19 ' + df
Out[11]:
time1 time2
0 2014-03-19 13:00:07.294234 2014-03-19 13:00:07.294234
1 2014-03-19 14:00:07.294234 2014-03-19 14:00:07.394234
2 2014-03-19 15:00:07.294234 2014-03-19 15:00:07.494234
3 2014-03-19 16:00:07.294234 2014-03-19 16:00:07.694234
[4 rows x 2 columns]
In [12]: df = ('2014-03-19 ' + df).astype('datetime64[ns]')
Out[12]:
time1 time2
0 2014-03-19 20:00:07.294234 2014-03-19 20:00:07.294234
1 2014-03-19 21:00:07.294234 2014-03-19 21:00:07.394234
2 2014-03-19 22:00:07.294234 2014-03-19 22:00:07.494234
3 2014-03-19 23:00:07.294234 2014-03-19 23:00:07.694234
Now you can subtract these columns: 现在您可以减去这些列:
In [13]: delta = df['time2'] - df['time1']
In [14]: delta
Out[14]:
0 00:00:00
1 00:00:00.100000
2 00:00:00.200000
3 00:00:00.400000
dtype: timedelta64[ns]
To get the number of microseconds, just divide the underlying nanoseconds by 1000: 要获得微秒数,只需将底层纳秒除以1000:
In [15]: t.astype(np.int64) / 10**3
Out[15]:
0 0
1 100000
2 200000
3 400000
dtype: int64
As Jeff points out, on recent versions of numpy you can divide by 1 micro second: 正如杰夫指出的那样,在numpy的最新版本中你可以除以1微秒:
In [16]: t / np.timedelta64(1,'us')
Out[16]:
0 0
1 100000
2 200000
3 400000
dtype: float64
the easiest way is just to do this: 最简单的方法就是这样做:
(pd.to_datetime(df['time2']) - pd.to_datetime(df['time1'])) / np.timedelta64(1, 'us')
' (pd.to_datetime(df['time2']) - pd.to_datetime(df['time1'])) / np.timedelta64(1, 'us')
'
At first I thought there was no correct answers here due to no green ticks. 起初我觉得这里没有正确的答案,因为没有绿色的蜱虫。 But as pointed out by Jeff in the comments, I was wrong.
但正如杰夫在评论中指出的那样,我错了。
Either way here is my contribution. 这两种方式都是我的贡献。
First, the obvious, making the datetime.time
into a timedelta
首先,显而易见的是,将
datetime.time
变为timedelta
df['delta'] = (pd.to_timedelta(df.time2.astype(str)) - pd.to_timedelta(df.time1.astype(str)))
time1 time2 delta
0 13:00:07.294234 13:00:07.294234 00:00:00
1 14:00:07.294234 14:00:07.394234 00:00:00.100000
2 15:00:07.294234 15:00:07.494234 00:00:00.200000
3 16:00:07.294234 16:00:07.694234 00:00:00.400000
Now that we have the timedelta
we can simply divide it by one microsecond to get the number of microseconds. 现在我们有了
timedelta
我们可以简单地将它除以1微秒来得到微秒数。
df['microsecond_delta'] = df.delta / pd.np.timedelta64(1, 'us')
time1 time2 delta microsecond_delta
0 13:00:07.294234 13:00:07.294234 00:00:00 0
1 14:00:07.294234 14:00:07.394234 00:00:00.100000 100000
2 15:00:07.294234 15:00:07.494234 00:00:00.200000 200000
3 16:00:07.294234 16:00:07.694234 00:00:00.400000 400000
I have to add that this is very counter intuitive, but it seems it is the only way. 我必须补充一点,这是非常直观的,但似乎这是唯一的方法。 There seem to be no way of accessing the milliseconds directly.
似乎没有办法直接访问毫秒。 I tried via applying lambda functions like:
我试过通过应用lambda函数,如:
df.delta.apply(lambda x: x.microseconds)
AttributeError: 'numpy.timedelta64' object has no attribute 'microseconds'
Same is true for seconds
, nanoseconds
, milliseconds
and so on... 对于
seconds
, nanoseconds
, milliseconds
也是如此......
Using dateutil you could transform your timestamp columns to 'real' timestamps: 使用dateutil,您可以将时间戳列转换为“实际”时间戳:
df.time1 = df.time1.apply(dateutil.parser.parse) df.time2 = df.time2.apply(dateutil.parser.parse)
After that you want to define a new column like this: 之后,您想要定义一个新列,如下所示:
df['delta'] = df.time2 - df.time1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.