[英]Convert timedelta64[ns] column to seconds in Python Pandas DataFrame
A pandas DataFrame column duration
contains timedelta64[ns]
as shown. A pandas DataFrame 列
duration
包含timedelta64[ns]
,如图所示。 How can you convert them to seconds?如何将它们转换为秒?
0 00:20:32
1 00:23:10
2 00:24:55
3 00:13:17
4 00:18:52
Name: duration, dtype: timedelta64[ns]
I tried the following我尝试了以下
print df[:5]['duration'] / np.timedelta64(1, 's')
but got the error但得到了错误
Traceback (most recent call last):
File "test.py", line 16, in <module>
print df[0:5]['duration'] / np.timedelta64(1, 's')
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 130, in wrapper
"addition and subtraction, but the operator [%s] was passed" % name)
TypeError: can only operate on a timedeltas for addition and subtraction, but the operator [__div__] was passed
Also tried也试过
print df[:5]['duration'].astype('timedelta64[s]')
but received the error但收到错误
Traceback (most recent call last):
File "test.py", line 17, in <module>
print df[:5]['duration'].astype('timedelta64[s]')
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 934, in astype
values = com._astype_nansafe(self.values, dtype)
File "C:\Python27\lib\site-packages\pandas\core\common.py", line 1653, in _astype_nansafe
raise TypeError("cannot astype a timedelta from [%s] to [%s]" % (arr.dtype,dtype))
TypeError: cannot astype a timedelta from [timedelta64[ns]] to [timedelta64[s]]
This works properly in the current version of Pandas (version 0.14):这在当前版本的 Pandas(0.14 版)中可以正常工作:
In [132]: df[:5]['duration'] / np.timedelta64(1, 's')
Out[132]:
0 1232
1 1390
2 1495
3 797
4 1132
Name: duration, dtype: float64
Here is a workaround for older versions of Pandas/NumPy:这是旧版本 Pandas/NumPy 的解决方法:
In [131]: df[:5]['duration'].values.view('<i8')/10**9
Out[131]: array([1232, 1390, 1495, 797, 1132], dtype=int64)
timedelta64 and datetime64 data are stored internally as 8-byte ints (dtype '<i8'
). timedelta64 和 datetime64 数据在内部存储为 8 字节整数(dtype
'<i8'
)。 So the above views the timedelta64s as 8-byte ints and then does integer division to convert nanoseconds to seconds.因此上面将 timedelta64s 视为 8 字节整数,然后进行整数除法以将纳秒转换为秒。
Note that you need NumPy version 1.7 or newer to work with datetime64/timedelta64s.请注意,您需要 NumPy 1.7 或更高版本才能使用 datetime64/timedelta64s。
Use the Series dt accessor to get access to the methods and attributes of a datetime (timedelta) series.使用Series dt访问器访问日期时间 (timedelta) 系列的方法和属性。
>>> s
0 -1 days +23:45:14.304000
1 -1 days +23:46:57.132000
2 -1 days +23:49:25.913000
3 -1 days +23:59:48.913000
4 00:00:00.820000
dtype: timedelta64[ns]
>>>
>>> s.dt.total_seconds()
0 -885.696
1 -782.868
2 -634.087
3 -11.087
4 0.820
dtype: float64
There are other Pandas Series Accessors for String, Categorical, and Sparse data types.还有其他用于字符串、分类和稀疏数据类型的 Pandas 系列访问器。
Just realized it's an old thread, anyway leaving it here if wanderers like me clicks only on top 5 results on the search engine and ends up here.
刚刚意识到这是一个旧线程,无论如何,如果像我这样的流浪者只点击搜索引擎上的前 5 个结果并最终到这里,就把它留在这里。
Make sure that your types are correct.确保您的类型正确。
If you want to convert datetime to seconds , just sum up seconds for each hour, minute and seconds of the datetime object if its for duration within one date.如果要将datetime转换为seconds ,只需将 datetime 对象的每一小时、分钟和秒的秒数相加,如果它的持续时间在一个日期内。
linear_df['duration'].dt.hour*3600 + linear_df['duration'].dt.minute*60 + linear_df['duration'].dt.second
linear_df[:5]['duration'].astype('timedelta64[s]')
I got it to work like this:我让它像这样工作:
start_dt and end_dt columns are in this format: start_dt 和 end_dt 列采用以下格式:
import datetime
linear_df[:5]['start_dt']
0 1970-02-22 21:32:48.000
1 2016-12-30 17:47:33.216
2 2016-12-31 09:33:27.931
3 2016-12-31 09:52:53.486
4 2016-12-31 10:29:44.611
Name: start_dt, dtype: datetime64[ns]
Had my duration in timedelta64[ns] format, which was subtraction of start and end datetime values.我的持续时间为 timedelta64[ns] 格式,即开始和结束日期时间值的减法。
linear_df['duration'] = linear_df['end_dt'] - linear_df['start_dt']
Resulted duration column look like this结果的持续时间列如下所示
linear_df[:5]['duration']
0 0 days 00:00:14
1 2 days 17:44:50.558000
2 0 days 15:37:28.418000
3 0 days 18:45:45.727000
4 0 days 19:21:27.159000
Name: duration, dtype: timedelta64[ns]
Using pandas I had my duration seconds between two dates in float.使用熊猫,我在浮动的两个日期之间有我的持续时间秒数。 Easier to compare or filter your duration afterwards.
之后更容易比较或过滤您的持续时间。
linear_df[:5]['duration'].astype('timedelta64[s]')
0 14.0
1 236690.0
2 56248.0
3 67545.0
4 69687.0
Name: duration, dtype: float64
In my case if I want to get all duration which is more than 1 second.就我而言,如果我想获得超过 1 秒的所有持续时间。
Hope it helps.希望能帮助到你。
使用“total_seconds()”函数:
df['durationSeconds'] = df['duration'].dt.total_seconds()
We can simply use the pandas apply() function我们可以简单地使用 pandas 的apply()函数
def get_seconds(time_delta):
return time_delta.seconds
def get_microseconds(time_delta):
return time_delta.micro_seconds
time_delta_series = df['duration']
converted_series = time_delta_series.apply(get_seconds)
print(converted_series)
Well the answers didn't age well.好吧,答案并没有很好地老化。 Here is a simpler solution:
这是一个更简单的解决方案:
df.duration.dt.total_seconds()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.