[英]pandas with assign loc & pd.Timedelta --> mean
I have a dataframe with 3 columns (id, date1,date2)我有一个 dataframe 有 3 列 (id, date1,date2)
data = [['C', '05/06/2021','07/09/2021'],['A', '15/04/2021','08/09/2021'],['A','15/10/2021','09/12/2021'],['C', '03/07/2021','10/09/2021'],['C', '13/07/2021','11/09/2021'],['C', '25/10/2021','12/12/2021'],['C', '26/09/2021','07/12/2021'],['C', '10/08/2021','07/12/2021'],['C', '28/07/2021','13/12/2021'],['A', '15/05/2021','13/12/2021'], ['C', '13/06/2021','13/12/2021'],['A', '17/05/2021','13/12/2021'],['C', '27/06/2021','13/12/2021'], ['B', '18/06/2021','13/12/2021']]
df_test = pd.DataFrame(data, columns = ['id', 'date1', 'date2'])
df_test['date1'] = pd.to_datetime(df_test['date1'],dayfirst=True)
df_test['date2'] = pd.to_datetime(df_test['date2'],dayfirst=True)
I want to calculate difference between date2-date1 and calculate means based on days filter >=100 I have 2 methods the first one works but not the second doesn't... How could i fix it?我想计算 date2-date1 之间的差异并根据天数过滤器计算均值 >=100 我有 2 种方法,第一种方法有效,但第二种方法无效...我该如何解决?
The first one works第一个有效
df_final=(df_test
.sort_values(by='id')
.assign(diffe=df_test['date2']- df_test['date1']
)
)
and和
test=df_final.loc[df_final['diffe']>=pd.Timedelta(100, 'D')]
test['diffe'].mean()
Second method第二种方法
df_final=(df_test
.sort_values(by='id') # Classe les numeros sont dans l'ordre
.assign(diffe=df_test['date2']- df_test['date1']
)
.loc[df_reservation_delay['diffe']>=pd.Timedelta(100, 'D')]
.mean()
)
I have an error (KeyError: 'diffe')?我有一个错误(KeyError:'diffe')? Do you have an idea?
你有好主意吗?
Have a nice day祝你今天过得愉快
You need to use a callable as your column does not exist yet.您需要使用可调用对象,因为您的列尚不存在。
Also, better explicitly provide the column name in loc
to avoid a FutureWarning
此外,最好在
loc
中明确提供列名以避免FutureWarning
FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated;
FutureWarning:不推荐删除 DataFrame 缩减中的有害列(使用“numeric_only=None”); in a future version this will raise TypeError.
在未来的版本中,这将引发 TypeError。 Select only valid columns before calling the reduction.
Select 只有在调用缩减前有效的列。
df_final=(df_test
.sort_values(by='id') # Classe les numeros sont dans l'ordre
.assign(diffe=df_test['date2']- df_test['date1']
)
.loc[lambda d: d['diffe']>=pd.Timedelta(100, 'D'), 'diffe']
.mean()
)
output: Timedelta('169 days 09:00:00')
output:
Timedelta('169 days 09:00:00')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.