pandas 计算两列之间的差异

Question

Im new using stackoverflow我是新来使用stackoverflow的

I want to calculate per id and month, the hours between (end and beg) two timestamp, what is the best way to get it please.我想计算每个 id 和月份，（结束和乞求）两个时间戳之间的小时数，请问获得它的最佳方法是什么。

import pandas as pd
df = pd.DataFrame({'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2']
       ,  'beg':['2021-01-01 00:00:00',
       '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00',
       '2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-01 00:00:00'],
          'end':['2021-01-02 00:00:00 ',
       '2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00',
       '2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']}

Expected output预期 output

x1 01/2021  24
x1 02/2021  22
x2 02/2021    20
x2 03/2021     58
x2 04/2021 720
x2 05/2021 192

Answer 1

calculate the difference then groupby id and month.计算差异，然后按 id 和月份分组。 get the sum of the difference and calculate the hours得到差的总和并计算小时数

df.assign(diff=df[['beg', 'end']].diff(axis=1)['end']).groupby(['id', df['beg'].dt.strftime('%m/%Y')]).agg('sum')/np.timedelta64(1, 'h')

             diff
id beg           
x1 01/2021   24.0
   02/2021   22.0
x2 02/2021   20.0
   03/2021   58.0
   04/2021  720.0

Answer 2

First, we have to do some work-around to proper label each month:首先，我们必须每月做一些解决方法来正确 label：

# Convert your data to datetime
df[['beg','end']] = df[['beg','end']].astype('datetime64[ns]')

# Identify rows with distinct months
months_diff = df.beg.dt.month < df.end.dt.month

# Function to split the months for posterior time comparison
def deal_with_diff_months(row):
    actual_month = [row['id'], row['beg'], row['end'].floor('d')]
    next_month = [row['id'], row['end'].floor('d'), row['end']]
    return actual_month, next_month

# Create a new dataframe for split months
df_tmp = df[months_diff].apply(deal_with_diff_months, axis=1)
df_tmp = pd.DataFrame(df_tmp.explode().tolist(), columns=df.columns)

# Renew dataframe with split months
df = df[~months_diff].append(df_tmp)

Now we can use the code chunk below as originally answered:现在我们可以使用下面最初回答的代码块：

# Create a new column to group by month as well
df['month'] = df['beg'].dt.strftime('%m/%Y')

# Group by id and month, then calculate and sum the difference
result = df.groupby(['id','month']).apply(lambda x: (x['end'] - x['beg']).sum())

# Convert the difference to hours
result = (result.dt.total_seconds()/60/60).astype(int)

Output: Output：

id  month  
x1  01/2021     24
    02/2021     22
x2  02/2021     20
    03/2021     58
    04/2021    720
    05/2021      0

Answer 3

You may try this:你可以试试这个：

df = pd.DataFrame(
        {'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2'], 
        'beg':['2021-01-01 00:00:00', '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00','2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-08 00:00:00'],
        'end':['2021-01-02 00:00:00','2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00','2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']})

df['beg'] = pd.to_datetime(df['beg'], format='%Y-%m-%d %H:%M:%S')
df['end'] = pd.to_datetime(df['end'], format='%Y-%m-%d %H:%M:%S')

hours_diff = []
for i in range(len(df)):
    diff = df['end'][i] - df['beg'][i]
    days, seconds = diff.days, diff.seconds
    hours = days * 24 + seconds // 3600
    hours_diff.append(hours)

df['hours_diff'] =  hours_diff

print(df)

Output: Output：

id  beg                  end              hours_diff
0   x1   2021-01-01  2021-01-02 00:00:00     24
1   x1   2021-02-03  2021-02-03 12:00:00     12
2   x1   2021-02-04  2021-02-04 10:00:00     10
3   x2   2021-02-05  2021-02-05 10:00:00     10
4   x2   2021-02-06  2021-02-06 10:00:00     10
5   x2   2021-03-05  2021-03-07 10:00:00     58
6   x2   2021-04-08  2021-05-08 00:00:00     720

pandas 计算两列之间的差异

问题描述

3 个解决方案

解决方案1
3 2021-04-22 13:03:37

解决方案2
1 2021-04-22 12:53:57

解决方案3
0 2021-04-22 13:03:26

pandas 计算两列之间的差异

问题描述

3 个解决方案

解决方案1 3 2021-04-22 13:03:37

解决方案2 1 2021-04-22 12:53:57

解决方案3 0 2021-04-22 13:03:26

解决方案1
3 2021-04-22 13:03:37

解决方案2
1 2021-04-22 12:53:57

解决方案3
0 2021-04-22 13:03:26