[英]Groupby column and based on that groupby another
我有這樣的 DataFrame
d = {'id': [1, 2, 3, 4, 5, 6],
'y_true': [0, 0, 1, 1, 1, 0],
'y_pred': [0.23, 0.01, 0.19, 0.01, 0.3, 0.23]
}
df = pd.DataFrame(data=d)
我想 groupby y_pred
然后 groupby y_true
為相同的列找到每行的平均值y_true
,對應於y_pred
。 可以這么說
d1 = {'y_true': [0, 0.5, 1, 1],
'y_pred': [0.23, 0.01, 0.19, 0.3]
}
df1 = pd.DataFrame(data=d1)
我知道 groupby y_pred
列如何,但我只能手動 groupby y_true
,逐行
嘗試:
df.groupby('y_pred')['y_true'].mean().reset_index()
# df.groupby("y_pred").apply(lambda x: x['y_true'].mean()).reset_index(name="y_true") #same
y_pred y_true
0 0.01 0.5
1 0.19 1.0
2 0.23 0.0
3 0.30 1.0
#or use numpy mean (maybe numpy has higher probability to be less wrong than panda mean)
import numpy as np
df.groupby('y_pred').agg({'y_true': np.mean}).reset_index()
#can combine both numpy mean and pandas mean
df.groupby('y_pred').agg(y_true_pd_mean=('y_true', 'mean'), y_true_np_mean=('y_true', np.mean)).reset_index()
y_pred y_true_pd_mean y_true_np_mean
0 0.01 0.5 0.5
1 0.19 1.0 1.0
2 0.23 0.0 0.0
3 0.30 1.0 1.0
#can also use mean from statistics module:
import statistics
df.groupby('y_pred').agg({'y_true': statistics.mean}).reset_index()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.