简体   繁体   English

pandas with assign loc & pd.Timedelta --> mean

[英]pandas with assign loc & pd.Timedelta --> mean

I have a dataframe with 3 columns (id, date1,date2)我有一个 dataframe 有 3 列 (id, date1,date2)

data = [['C', '05/06/2021','07/09/2021'],['A', '15/04/2021','08/09/2021'],['A','15/10/2021','09/12/2021'],['C', '03/07/2021','10/09/2021'],['C', '13/07/2021','11/09/2021'],['C', '25/10/2021','12/12/2021'],['C', '26/09/2021','07/12/2021'],['C', '10/08/2021','07/12/2021'],['C', '28/07/2021','13/12/2021'],['A', '15/05/2021','13/12/2021'], ['C', '13/06/2021','13/12/2021'],['A', '17/05/2021','13/12/2021'],['C', '27/06/2021','13/12/2021'], ['B', '18/06/2021','13/12/2021']]

df_test = pd.DataFrame(data, columns = ['id', 'date1', 'date2'])
df_test['date1'] = pd.to_datetime(df_test['date1'],dayfirst=True)
df_test['date2'] = pd.to_datetime(df_test['date2'],dayfirst=True)

I want to calculate difference between date2-date1 and calculate means based on days filter >=100 I have 2 methods the first one works but not the second doesn't... How could i fix it?我想计算 date2-date1 之间的差异并根据天数过滤器计算均值 >=100 我有 2 种方法,第一种方法有效,但第二种方法无效...我该如何解决?

The first one works第一个有效

df_final=(df_test
.sort_values(by='id')
.assign(diffe=df_test['date2']- df_test['date1']
)   
)

and

test=df_final.loc[df_final['diffe']>=pd.Timedelta(100, 'D')]
test['diffe'].mean()

Second method第二种方法

df_final=(df_test
    .sort_values(by='id') # Classe les numeros sont dans l'ordre
    .assign(diffe=df_test['date2']- df_test['date1']
    )
    .loc[df_reservation_delay['diffe']>=pd.Timedelta(100, 'D')]
    .mean()
    )

I have an error (KeyError: 'diffe')?我有一个错误(KeyError:'diffe')? Do you have an idea?你有好主意吗?

Have a nice day祝你今天过得愉快

You need to use a callable as your column does not exist yet.您需要使用可调用对象,因为您的列尚不存在。

Also, better explicitly provide the column name in loc to avoid a FutureWarning此外,最好在loc中明确提供列名以避免FutureWarning

FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; FutureWarning:不推荐删除 DataFrame 缩减中的有害列(使用“numeric_only=None”); in a future version this will raise TypeError.在未来的版本中,这将引发 TypeError。 Select only valid columns before calling the reduction. Select 只有在调用缩减前有效的列。

df_final=(df_test
    .sort_values(by='id') # Classe les numeros sont dans l'ordre
    .assign(diffe=df_test['date2']- df_test['date1']
    )
    .loc[lambda d: d['diffe']>=pd.Timedelta(100, 'D'), 'diffe']
    .mean()
    )

output: Timedelta('169 days 09:00:00') output: Timedelta('169 days 09:00:00')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM