简体   繁体   English

将 timedelta64[ns] 列转换为 Python Pandas DataFrame 中的秒

[英]Convert timedelta64[ns] column to seconds in Python Pandas DataFrame

A pandas DataFrame column duration contains timedelta64[ns] as shown. A pandas DataFrame 列duration包含timedelta64[ns] ,如图所示。 How can you convert them to seconds?如何将它们转换为秒?

0   00:20:32
1   00:23:10
2   00:24:55
3   00:13:17
4   00:18:52
Name: duration, dtype: timedelta64[ns]

I tried the following我尝试了以下

print df[:5]['duration'] / np.timedelta64(1, 's')

but got the error但得到了错误

Traceback (most recent call last):
  File "test.py", line 16, in <module>
    print df[0:5]['duration'] / np.timedelta64(1, 's')
  File "C:\Python27\lib\site-packages\pandas\core\series.py", line 130, in wrapper
    "addition and subtraction, but the operator [%s] was passed" % name)
TypeError: can only operate on a timedeltas for addition and subtraction, but the operator [__div__] was passed

Also tried也试过

print df[:5]['duration'].astype('timedelta64[s]')

but received the error但收到错误

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    print df[:5]['duration'].astype('timedelta64[s]')
  File "C:\Python27\lib\site-packages\pandas\core\series.py", line 934, in astype
    values = com._astype_nansafe(self.values, dtype)
  File "C:\Python27\lib\site-packages\pandas\core\common.py", line 1653, in _astype_nansafe
    raise TypeError("cannot astype a timedelta from [%s] to [%s]" % (arr.dtype,dtype))
TypeError: cannot astype a timedelta from [timedelta64[ns]] to [timedelta64[s]]

This works properly in the current version of Pandas (version 0.14):这在当前版本的 Pandas(0.14 版)中可以正常工作:

In [132]: df[:5]['duration'] / np.timedelta64(1, 's')
Out[132]: 
0    1232
1    1390
2    1495
3     797
4    1132
Name: duration, dtype: float64

Here is a workaround for older versions of Pandas/NumPy:这是旧版本 Pandas/NumPy 的解决方法:

In [131]: df[:5]['duration'].values.view('<i8')/10**9
Out[131]: array([1232, 1390, 1495,  797, 1132], dtype=int64)

timedelta64 and datetime64 data are stored internally as 8-byte ints (dtype '<i8' ). timedelta64 和 datetime64 数据在内部存储为 8 字节整数(dtype '<i8' )。 So the above views the timedelta64s as 8-byte ints and then does integer division to convert nanoseconds to seconds.因此上面将 timedelta64s 视为 8 字节整数,然后进行整数除法以将纳秒转换为秒。

Note that you need NumPy version 1.7 or newer to work with datetime64/timedelta64s.请注意,您需要 NumPy 1.7 或更高版本才能使用 datetime64/timedelta64s。

Use the Series dt accessor to get access to the methods and attributes of a datetime (timedelta) series.使用Series dt访问器访问日期时间 (timedelta) 系列的方法和属性。

>>> s
0   -1 days +23:45:14.304000
1   -1 days +23:46:57.132000
2   -1 days +23:49:25.913000
3   -1 days +23:59:48.913000
4            00:00:00.820000
dtype: timedelta64[ns]
>>>
>>> s.dt.total_seconds()
0   -885.696
1   -782.868
2   -634.087
3    -11.087
4      0.820
dtype: float64

There are other Pandas Series Accessors for String, Categorical, and Sparse data types.还有其他用于字符串、分类和稀疏数据类型的 Pandas 系列访问器。

Just realized it's an old thread, anyway leaving it here if wanderers like me clicks only on top 5 results on the search engine and ends up here.刚刚意识到这是一个旧线程,无论如何,如果像我这样的流浪者只点击搜索引擎上的前 5 个结果并最终到这里,就把它留在这里。

Make sure that your types are correct.确保您的类型正确。

  • If you want to convert datetime to seconds , just sum up seconds for each hour, minute and seconds of the datetime object if its for duration within one date.如果要将datetime转换为seconds ,只需将 datetime 对象的每一小时、分钟和秒的秒数相加,如果它的持续时间在一个日期内。

      • hours - hours x 3600 = seconds小时 - 小时 x 3600 = 秒
      • minutes - minutes x 60 = seconds分钟 - 分钟 x 60 = 秒
      • seconds - seconds秒 - 秒

linear_df['duration'].dt.hour*3600 + linear_df['duration'].dt.minute*60 + linear_df['duration'].dt.second

  • If you want to convert timedelta to seconds use the one bellow.如果要将timedelta转换为,请使用下面的那个。

linear_df[:5]['duration'].astype('timedelta64[s]')

I got it to work like this:我让它像这样工作:

start_dt and end_dt columns are in this format: start_dt 和 end_dt 列采用以下格式:

import datetime

linear_df[:5]['start_dt']

0   1970-02-22 21:32:48.000
1   2016-12-30 17:47:33.216
2   2016-12-31 09:33:27.931
3   2016-12-31 09:52:53.486
4   2016-12-31 10:29:44.611
Name: start_dt, dtype: datetime64[ns]

Had my duration in timedelta64[ns] format, which was subtraction of start and end datetime values.我的持续时间为 timedelta64[ns] 格式,即开始结束日期时间值的减法。

linear_df['duration'] = linear_df['end_dt'] - linear_df['start_dt']

Resulted duration column look like this结果的持续时间列如下所示

linear_df[:5]['duration']

0          0 days 00:00:14
1   2 days 17:44:50.558000
2   0 days 15:37:28.418000
3   0 days 18:45:45.727000
4   0 days 19:21:27.159000
Name: duration, dtype: timedelta64[ns]

Using pandas I had my duration seconds between two dates in float.使用熊猫,我在浮动的两个日期之间有我的持续时间秒数。 Easier to compare or filter your duration afterwards.之后更容易比较或过滤您的持续时间。

linear_df[:5]['duration'].astype('timedelta64[s]')

0        14.0
1    236690.0
2     56248.0
3     67545.0
4     69687.0
Name: duration, dtype: float64

In my case if I want to get all duration which is more than 1 second.就我而言,如果我想获得超过 1 秒的所有持续时间。

Hope it helps.希望能帮助到你。

使用“total_seconds()”函数:

df['durationSeconds'] = df['duration'].dt.total_seconds()

We can simply use the pandas apply() function我们可以简单地使用 pandas 的apply()函数

def get_seconds(time_delta):
    return time_delta.seconds

def get_microseconds(time_delta):
    return time_delta.micro_seconds

time_delta_series = df['duration']

converted_series = time_delta_series.apply(get_seconds)
print(converted_series)

Well the answers didn't age well.好吧,答案并没有很好地老化。 Here is a simpler solution:这是一个更简单的解决方案:

df.duration.dt.total_seconds()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM