简体   繁体   English

pandas 计算两列之间的差异

[英]pandas calculate difference between two columns

Im new using stackoverflow我是新来使用stackoverflow的

I want to calculate per id and month, the hours between (end and beg) two timestamp, what is the best way to get it please.我想计算每个 id 和月份,(结束和乞求)两个时间戳之间的小时数,请问获得它的最佳方法是什么。

import pandas as pd
df = pd.DataFrame({'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2']
       ,  'beg':['2021-01-01 00:00:00',
       '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00',
       '2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-01 00:00:00'],
          'end':['2021-01-02 00:00:00 ',
       '2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00',
       '2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']}

Expected output预期 output

x1 01/2021  24
x1 02/2021  22
x2 02/2021    20
x2 03/2021     58
x2 04/2021 720
x2 05/2021 192

calculate the difference then groupby id and month.计算差异,然后按 id 和月份分组。 get the sum of the difference and calculate the hours得到差的总和并计算小时数

df.assign(diff=df[['beg', 'end']].diff(axis=1)['end']).groupby(['id', df['beg'].dt.strftime('%m/%Y')]).agg('sum')/np.timedelta64(1, 'h')

             diff
id beg           
x1 01/2021   24.0
   02/2021   22.0
x2 02/2021   20.0
   03/2021   58.0
   04/2021  720.0

First, we have to do some work-around to proper label each month:首先,我们必须每月做一些解决方法来正确 label:

# Convert your data to datetime
df[['beg','end']] = df[['beg','end']].astype('datetime64[ns]')

# Identify rows with distinct months
months_diff = df.beg.dt.month < df.end.dt.month

# Function to split the months for posterior time comparison
def deal_with_diff_months(row):
    actual_month = [row['id'], row['beg'], row['end'].floor('d')]
    next_month = [row['id'], row['end'].floor('d'), row['end']]
    return actual_month, next_month

# Create a new dataframe for split months
df_tmp = df[months_diff].apply(deal_with_diff_months, axis=1)
df_tmp = pd.DataFrame(df_tmp.explode().tolist(), columns=df.columns)

# Renew dataframe with split months
df = df[~months_diff].append(df_tmp)

Now we can use the code chunk below as originally answered:现在我们可以使用下面最初回答的代码块:

# Create a new column to group by month as well
df['month'] = df['beg'].dt.strftime('%m/%Y')

# Group by id and month, then calculate and sum the difference
result = df.groupby(['id','month']).apply(lambda x: (x['end'] - x['beg']).sum())

# Convert the difference to hours
result = (result.dt.total_seconds()/60/60).astype(int)

Output: Output:

id  month  
x1  01/2021     24
    02/2021     22
x2  02/2021     20
    03/2021     58
    04/2021    720
    05/2021      0

You may try this:你可以试试这个:

df = pd.DataFrame(
        {'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2'], 
        'beg':['2021-01-01 00:00:00', '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00','2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-08 00:00:00'],
        'end':['2021-01-02 00:00:00','2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00','2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']})

df['beg'] = pd.to_datetime(df['beg'], format='%Y-%m-%d %H:%M:%S')
df['end'] = pd.to_datetime(df['end'], format='%Y-%m-%d %H:%M:%S')

hours_diff = []
for i in range(len(df)):
    diff = df['end'][i] - df['beg'][i]
    days, seconds = diff.days, diff.seconds
    hours = days * 24 + seconds // 3600
    hours_diff.append(hours)

df['hours_diff'] =  hours_diff

print(df)

Output: Output:

id  beg                  end              hours_diff
0   x1   2021-01-01  2021-01-02 00:00:00     24
1   x1   2021-02-03  2021-02-03 12:00:00     12
2   x1   2021-02-04  2021-02-04 10:00:00     10
3   x2   2021-02-05  2021-02-05 10:00:00     10
4   x2   2021-02-06  2021-02-06 10:00:00     10
5   x2   2021-03-05  2021-03-07 10:00:00     58
6   x2   2021-04-08  2021-05-08 00:00:00     720

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫:计算来自不同时区的两个Datetime列之间的差异 - Pandas: Calculate the difference between two Datetime columns from different timezones 计算两个 Pandas DataFrame 中列之间的分数差异 - Calculate fractional difference between columns in two Pandas DataFrame 以小时和分钟为单位计算两个 Pandas 列之间的时间差 - Calculate Time Difference Between Two Pandas Columns in Hours and Minutes 如何使用 Pandas 计算两个 cumsum 列之间的差异 - How to calculate the difference between two cumsum columns using Pandas 计算熊猫数据框中两个 hh:mm 列之间的时间差 - Calculate the time difference between two hh:mm columns in a pandas dataframe 当列可能包含 NaT 时,将 Pandas 中两个时间列之间的差异计算为不包括周末的新列 - Calculate difference between two time columns in pandas as a new column excluding weekends, when the columns may contain NaT 计算 Pandas DataFrame 的两个日期之间的差异 - Calculate difference between two dates for a Pandas DataFrame 如何使用 pandas 计算两个 DateTime 列之间的时间(以秒为单位)差异? - How to calculate the time (in seconds) difference between two DateTime columns using pandas? Pandas 计算满足条件时列之间的时间差 - Pandas calculate the time difference between columns for when a condition is satisfied Pandas DataFrame 计算特定时间范围内两列之间的时间差 - Pandas DataFrame Calculate time difference between 2 columns on specific time range
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM