简体   繁体   English

将 timedelta64[ns] 转换为十进制

[英]Converting timedelta64[ns] to decimal

I have a data frame that I use to calculate the blockTime column that is the difference between endDate and startDate.我有一个数据框,用于计算作为 endDate 和 startDate 之间差异的 blockTime 列。 It gives me the results like 0 days 01:45:00 but I need to have the decimal number just for hours.它给了我像0 days 01:45:00这样的结果,但我只需要几个小时的十进制数。 In this case 1.75.在本例中为 1.75。

My df is as follows:我的df如下:

import pandas as pd

data = {'endDate': ['01/10/2020 15:23', '01/10/2020 16:31', '01/10/2020 16:20', '01/10/2020 11:00'],
      'startDate': ['01/10/2020 13:38', '01/10/2020 14:49', '01/10/2020 14:30','01/10/2020 14:30']
      }

df = pd.DataFrame(data, columns = ['endDate','startDate'])

df['endDate'] = pd.to_datetime(df['endDate'])
df['startDate'] = pd.to_datetime(df['startDate'])

df['blockTime'] = (df['endDate'] - df['startDate'])

df = df.reindex(columns= ['startDate', 'endDate', 'blockTime'])

And the desired results would be a data frame like the one below.所需的结果将是如下所示的数据框。 Note, if the minus value is produced I need to somehow highlight it as incorrect.请注意,如果产生负值,我需要以某种方式将其突出显示为不正确。 I thought -999 might be ideal.我认为 -999 可能是理想的。

startDate           endDate               blockTime                 desiredResult
2020-01-10 13:38:00 2020-01-10 15:23:00   0 days 01:45:00           1.75
2020-01-10 14:49:00 2020-01-10 16:31:00   0 days 01:42:00           1.70
2020-01-10 14:30:00 2020-01-10 16:20:00   0 days 01:50:00           1.83
2020-01-10 14:30:00 2020-01-10 11:00:00  -1 days +20:30:00          -999.00

That is just the way the timedelta object is represented when you print the dataframe.这就是打印数据帧时timedelta对象的表示方式。 If you just want to save the number of hours as a float instead of the entire timedelta object, timedelta objects have a total_seconds() function you can use like so:如果您只想将小时数保存为float而不是整个timedelta对象, timedelta对象有一个total_seconds()函数,您可以像这样使用:

def td2hours(tdobject):
    if tdobject.total_seconds() < 0:
        return -999
    return tdobject.total_seconds() / 3600

df['blockTime']= (df['endDate'] - df['startDate']).apply(td2hours)

Or, as Gustavo suggested in the comments , you can avoid using apply() .或者,正如Gustavo 在评论中建议的那样,您可以避免使用apply() This is faster when you have large datasets:当您有大型数据集时,这会更快:

blockTime = ((df['endDate'] - df['startDate']).dt.total_seconds() / 3600).to_numpy()
blockTime[blockTime < 0] = -999
df['blockTime'] = blockTime

Output:输出:

              endDate           startDate   blockTime
0 2020-01-10 15:23:00 2020-01-10 13:38:00    1.750000
1 2020-01-10 16:31:00 2020-01-10 14:49:00    1.700000
2 2020-01-10 16:20:00 2020-01-10 14:30:00    1.833333
3 2020-01-10 11:00:00 2020-01-10 14:30:00 -999.000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM