简体   繁体   English

pd.Timedelta在数据帧列上的转换

[英]pd.Timedelta conversion on a dataframe column

I am trying to convert a dataframe column to a timedelta but am having issues. 我正在尝试将数据帧列转换为timedelta但是遇到了问题。 The format that the column comes in looks like '+XX:XX:XX' or '-XX:XX:XX' 列进入的格式类似于“+ XX:XX:XX”或“-XX:XX:XX”

My dataframe: 我的数据帧:

    df = pd.DataFrame({'time':['+06:00:00', '-04:00:00'],})

My approach: 我的方法:

    df['time'] = pd.Timedelta(df['time'])

However, I get the error: 但是,我收到错误:

    ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible

When I do a simpler example: 当我做一个更简单的例子:

    time = pd.Timedelta('+06:00:00')

I get my desired output: 我得到了我想要的输出:

    Timedelta('0 days 06:00:00')

What would be the approach if I wanted to convert a series into a timedelta with my desired output? 如果我想将一个系列转换为具有所需输出的timedelta,那会是什么方法?

I would strongly recommend to use specifically designed and vectorized (ie very fast) method: to_timedelta() : 我强烈建议使用专门设计和矢量化(即非常快)的方法: to_timedelta()

In [40]: pd.to_timedelta(df['time'])
Out[40]:
0            06:00:00
1   -1 days +20:00:00
Name: time, dtype: timedelta64[ns]

Timing against a 200K rows DF: 针对200K行DF的时序

In [41]: df = pd.concat([df] * 10**5, ignore_index=True)

In [42]: df.shape
Out[42]: (200000, 1)

In [43]: %timeit pd.to_timedelta(df['time'])
1 loop, best of 3: 891 ms per loop

In [44]: %timeit df['time'].apply(pd.Timedelta)
1 loop, best of 3: 7.15 s per loop

In [45]: %timeit [pd.Timedelta(x) for x in df['time']]
1 loop, best of 3: 5.52 s per loop

The error is pretty clear: 错误很明显:

ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible ValueError:Value必须是Timedelta,string,integer,float,timedelta或convertible

What you are passing to pd.Timedelta() is none of the above data types: 你传递给pd.Timedelta()是以上数据类型:

>>> type(df['time'])
<class 'pandas.core.series.Series'>

Probably what you want it: 可能是你想要的:

>>> [pd.Timedelta(x) for x in df['time']]
[Timedelta('0 days 06:00:00'), Timedelta('-1 days +20:00:00')]

Or: 要么:

>>> df['time'].apply(pd.Timedelta)
0            06:00:00
1   -1 days +20:00:00
Name: time, dtype: timedelta64[ns]

See more examples in the docs . 查看文档中的更多示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM