[英]Pandas: Calculate the difference between two Datetime columns from different timezones
[英]pandas calculate difference between two columns
我是新來使用stackoverflow的
我想計算每個 id 和月份,(結束和乞求)兩個時間戳之間的小時數,請問獲得它的最佳方法是什么。
import pandas as pd
df = pd.DataFrame({'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2']
, 'beg':['2021-01-01 00:00:00',
'2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00',
'2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-01 00:00:00'],
'end':['2021-01-02 00:00:00 ',
'2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00',
'2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']}
預期 output
x1 01/2021 24
x1 02/2021 22
x2 02/2021 20
x2 03/2021 58
x2 04/2021 720
x2 05/2021 192
計算差異,然后按 id 和月份分組。 得到差的總和並計算小時數
df.assign(diff=df[['beg', 'end']].diff(axis=1)['end']).groupby(['id', df['beg'].dt.strftime('%m/%Y')]).agg('sum')/np.timedelta64(1, 'h')
diff
id beg
x1 01/2021 24.0
02/2021 22.0
x2 02/2021 20.0
03/2021 58.0
04/2021 720.0
首先,我們必須每月做一些解決方法來正確 label:
# Convert your data to datetime
df[['beg','end']] = df[['beg','end']].astype('datetime64[ns]')
# Identify rows with distinct months
months_diff = df.beg.dt.month < df.end.dt.month
# Function to split the months for posterior time comparison
def deal_with_diff_months(row):
actual_month = [row['id'], row['beg'], row['end'].floor('d')]
next_month = [row['id'], row['end'].floor('d'), row['end']]
return actual_month, next_month
# Create a new dataframe for split months
df_tmp = df[months_diff].apply(deal_with_diff_months, axis=1)
df_tmp = pd.DataFrame(df_tmp.explode().tolist(), columns=df.columns)
# Renew dataframe with split months
df = df[~months_diff].append(df_tmp)
現在我們可以使用下面最初回答的代碼塊:
# Create a new column to group by month as well
df['month'] = df['beg'].dt.strftime('%m/%Y')
# Group by id and month, then calculate and sum the difference
result = df.groupby(['id','month']).apply(lambda x: (x['end'] - x['beg']).sum())
# Convert the difference to hours
result = (result.dt.total_seconds()/60/60).astype(int)
Output:
id month
x1 01/2021 24
02/2021 22
x2 02/2021 20
03/2021 58
04/2021 720
05/2021 0
你可以試試這個:
df = pd.DataFrame(
{'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2'],
'beg':['2021-01-01 00:00:00', '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00','2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-08 00:00:00'],
'end':['2021-01-02 00:00:00','2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00','2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']})
df['beg'] = pd.to_datetime(df['beg'], format='%Y-%m-%d %H:%M:%S')
df['end'] = pd.to_datetime(df['end'], format='%Y-%m-%d %H:%M:%S')
hours_diff = []
for i in range(len(df)):
diff = df['end'][i] - df['beg'][i]
days, seconds = diff.days, diff.seconds
hours = days * 24 + seconds // 3600
hours_diff.append(hours)
df['hours_diff'] = hours_diff
print(df)
Output:
id beg end hours_diff
0 x1 2021-01-01 2021-01-02 00:00:00 24
1 x1 2021-02-03 2021-02-03 12:00:00 12
2 x1 2021-02-04 2021-02-04 10:00:00 10
3 x2 2021-02-05 2021-02-05 10:00:00 10
4 x2 2021-02-06 2021-02-06 10:00:00 10
5 x2 2021-03-05 2021-03-07 10:00:00 58
6 x2 2021-04-08 2021-05-08 00:00:00 720
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.