簡體   English   中英

pandas 計算兩列之間的差異

[英]pandas calculate difference between two columns

我是新來使用stackoverflow的

我想計算每個 id 和月份,(結束和乞求)兩個時間戳之間的小時數,請問獲得它的最佳方法是什么。

import pandas as pd
df = pd.DataFrame({'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2']
       ,  'beg':['2021-01-01 00:00:00',
       '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00',
       '2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-01 00:00:00'],
          'end':['2021-01-02 00:00:00 ',
       '2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00',
       '2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']}

預期 output

x1 01/2021  24
x1 02/2021  22
x2 02/2021    20
x2 03/2021     58
x2 04/2021 720
x2 05/2021 192

計算差異,然后按 id 和月份分組。 得到差的總和並計算小時數

df.assign(diff=df[['beg', 'end']].diff(axis=1)['end']).groupby(['id', df['beg'].dt.strftime('%m/%Y')]).agg('sum')/np.timedelta64(1, 'h')

             diff
id beg           
x1 01/2021   24.0
   02/2021   22.0
x2 02/2021   20.0
   03/2021   58.0
   04/2021  720.0

首先,我們必須每月做一些解決方法來正確 label:

# Convert your data to datetime
df[['beg','end']] = df[['beg','end']].astype('datetime64[ns]')

# Identify rows with distinct months
months_diff = df.beg.dt.month < df.end.dt.month

# Function to split the months for posterior time comparison
def deal_with_diff_months(row):
    actual_month = [row['id'], row['beg'], row['end'].floor('d')]
    next_month = [row['id'], row['end'].floor('d'), row['end']]
    return actual_month, next_month

# Create a new dataframe for split months
df_tmp = df[months_diff].apply(deal_with_diff_months, axis=1)
df_tmp = pd.DataFrame(df_tmp.explode().tolist(), columns=df.columns)

# Renew dataframe with split months
df = df[~months_diff].append(df_tmp)

現在我們可以使用下面最初回答的代碼塊:

# Create a new column to group by month as well
df['month'] = df['beg'].dt.strftime('%m/%Y')

# Group by id and month, then calculate and sum the difference
result = df.groupby(['id','month']).apply(lambda x: (x['end'] - x['beg']).sum())

# Convert the difference to hours
result = (result.dt.total_seconds()/60/60).astype(int)

Output:

id  month  
x1  01/2021     24
    02/2021     22
x2  02/2021     20
    03/2021     58
    04/2021    720
    05/2021      0

你可以試試這個:

df = pd.DataFrame(
        {'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2'], 
        'beg':['2021-01-01 00:00:00', '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00','2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-08 00:00:00'],
        'end':['2021-01-02 00:00:00','2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00','2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']})

df['beg'] = pd.to_datetime(df['beg'], format='%Y-%m-%d %H:%M:%S')
df['end'] = pd.to_datetime(df['end'], format='%Y-%m-%d %H:%M:%S')

hours_diff = []
for i in range(len(df)):
    diff = df['end'][i] - df['beg'][i]
    days, seconds = diff.days, diff.seconds
    hours = days * 24 + seconds // 3600
    hours_diff.append(hours)

df['hours_diff'] =  hours_diff

print(df)

Output:

id  beg                  end              hours_diff
0   x1   2021-01-01  2021-01-02 00:00:00     24
1   x1   2021-02-03  2021-02-03 12:00:00     12
2   x1   2021-02-04  2021-02-04 10:00:00     10
3   x2   2021-02-05  2021-02-05 10:00:00     10
4   x2   2021-02-06  2021-02-06 10:00:00     10
5   x2   2021-03-05  2021-03-07 10:00:00     58
6   x2   2021-04-08  2021-05-08 00:00:00     720

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM