pandas with assign loc & pd.Timedelta --> mean

Question

I have a dataframe with 3 columns (id, date1,date2)我有一个 dataframe 有 3 列 (id, date1,date2)

data = [['C', '05/06/2021','07/09/2021'],['A', '15/04/2021','08/09/2021'],['A','15/10/2021','09/12/2021'],['C', '03/07/2021','10/09/2021'],['C', '13/07/2021','11/09/2021'],['C', '25/10/2021','12/12/2021'],['C', '26/09/2021','07/12/2021'],['C', '10/08/2021','07/12/2021'],['C', '28/07/2021','13/12/2021'],['A', '15/05/2021','13/12/2021'], ['C', '13/06/2021','13/12/2021'],['A', '17/05/2021','13/12/2021'],['C', '27/06/2021','13/12/2021'], ['B', '18/06/2021','13/12/2021']]

df_test = pd.DataFrame(data, columns = ['id', 'date1', 'date2'])
df_test['date1'] = pd.to_datetime(df_test['date1'],dayfirst=True)
df_test['date2'] = pd.to_datetime(df_test['date2'],dayfirst=True)

I want to calculate difference between date2-date1 and calculate means based on days filter >=100 I have 2 methods the first one works but not the second doesn't... How could i fix it?我想计算 date2-date1 之间的差异并根据天数过滤器计算均值 >=100 我有 2 种方法，第一种方法有效，但第二种方法无效...我该如何解决？

The first one works第一个有效

df_final=(df_test
.sort_values(by='id')
.assign(diffe=df_test['date2']- df_test['date1']
)   
)

and和

test=df_final.loc[df_final['diffe']>=pd.Timedelta(100, 'D')]
test['diffe'].mean()

Second method第二种方法

df_final=(df_test
    .sort_values(by='id') # Classe les numeros sont dans l'ordre
    .assign(diffe=df_test['date2']- df_test['date1']
    )
    .loc[df_reservation_delay['diffe']>=pd.Timedelta(100, 'D')]
    .mean()
    )

I have an error (KeyError: 'diffe')?我有一个错误（KeyError：'diffe'）？ Do you have an idea?你有好主意吗？

Have a nice day祝你今天过得愉快

Answer 1

You need to use a callable as your column does not exist yet.您需要使用可调用对象，因为您的列尚不存在。

Also, better explicitly provide the column name in loc to avoid a FutureWarning此外，最好在loc中明确提供列名以避免FutureWarning

FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; FutureWarning：不推荐删除 DataFrame 缩减中的有害列（使用“numeric_only=None”）； in a future version this will raise TypeError.在未来的版本中，这将引发 TypeError。 Select only valid columns before calling the reduction. Select 只有在调用缩减前有效的列。

df_final=(df_test
    .sort_values(by='id') # Classe les numeros sont dans l'ordre
    .assign(diffe=df_test['date2']- df_test['date1']
    )
    .loc[lambda d: d['diffe']>=pd.Timedelta(100, 'D'), 'diffe']
    .mean()
    )

output: Timedelta('169 days 09:00:00') output: Timedelta('169 days 09:00:00')

pandas with assign loc & pd.Timedelta --> mean

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-04-19 14:24:40

pandas with assign loc & pd.Timedelta --> mean

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-04-19 14:24:40

解决方案1
1 已采纳 2022-04-19 14:24:40