[英]How to group lists and evaluate mean square error?
I'm writing custom metric function and here's the steps I implemented:我正在编写自定义指标 function,这是我实施的步骤:
preds
and list of int
0-1
values in target
preds
中的浮点列表和target
中的int
0-1
值列表preds
preds
groupby
on preds
preds
上进行groupby
target
values for those groupedby preds
preds
的平均target
MSE
between groupedby preds
and target
preds
和target
之间的MSE
That's how df
looks like before groupby
这就是
df
在groupby
之前的样子
rounded = [np.round(x, 2) for x in preds]
df = pd.DataFrame({'target': target, 'preds': rounded})
df = df.groupby('preds')['target'].mean().to_frame().reset_index()
mse = mean_squared_error(df['target'], df['preds'])
And that's how after groupby
and mean()
(as I can't properly display groupby
)这就是
groupby
和mean()
之后的方式(因为我无法正确显示groupby
)
Basicaly, I don't know how to groupby on two python list
s.基本上,我不知道如何在两个 python
list
上进行分组。
I did groupby on one list like that我在这样的一个列表上做了 groupby
gr_list = [list(j) for i, j in groupby(rounded)]
But I have no clue how to groupby second list, based on gr_list
groupping但我不知道如何分组第二个列表,基于
gr_list
分组
Not the cleanest code, but I managed to do it like that:不是最干净的代码,但我设法做到了:
from collections import defaultdict
d = defaultdict(list)
for i, item in enumerate(rounded): # rounded is rounded preds
d[item].append(target[i])
meanDict = {}
for k,v in d.items():
meanDict[k] = sum(v)/ float(len(v))
preds, target = zip(*avgDict.items())
mse = mean_squared_error(values, keys)
Here is a reproducible example of a more idiomatic way to do what you are trying to achieve, if I understand correctly:如果我理解正确,这是一个更惯用的方法来实现您想要实现的目标的可重复示例:
import random
import pandas as pd
preds = [random.random() for _ in range(1_000)]
target = [random.randint(0, 1) for _ in range(1_000)]
df = pd.DataFrame({"preds": preds, "target": target})
import numpy as np
# Steps 1 to 4 of your post
df = df.round({"preds": 2}).groupby("preds").agg(np.mean).reset_index()
print(df)
# Output
preds target
0 0.00 1.000000
1 0.01 0.555556
2 0.02 0.375000
3 0.03 0.375000
4 0.04 0.416667
.. ... ...
96 0.96 0.666667
97 0.97 0.500000
98 0.98 0.375000
99 0.99 0.461538
100 1.00 0.285714
from sklearn.metrics import mean_squared_error
# Step 5
print(mean_squared_error(df["preds"], df["target"])) # 0.1084811098077257
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.