如何对列表进行分组并评估均方误差？

Question

I'm writing custom metric function and here's the steps I implemented:我正在编写自定义指标 function，这是我实施的步骤：

I have a list of floats in preds and list of int 0-1 values in target我有一个preds中的浮点列表和target中的int 0-1值列表
I round preds我圆preds
I need to make groupby on preds我需要在preds上进行groupby
Count mean target values for those groupedby preds计算那些按preds的平均target
Count MSE between groupedby preds and target计算 groupedby preds和target之间的MSE

That's how df looks like before groupby这就是df在groupby之前的样子

rounded = [np.round(x, 2) for x in preds]

df = pd.DataFrame({'target': target, 'preds': rounded})
        
df = df.groupby('preds')['target'].mean().to_frame().reset_index()
        
mse = mean_squared_error(df['target'], df['preds'])

And that's how after groupby and mean() (as I can't properly display groupby )这就是groupby和mean()之后的方式（因为我无法正确显示groupby ）

Basicaly, I don't know how to groupby on two python list s.基本上，我不知道如何在两个 python list上进行分组。

I did groupby on one list like that我在这样的一个列表上做了 groupby

gr_list = [list(j) for i, j in groupby(rounded)]

But I have no clue how to groupby second list, based on gr_list groupping但我不知道如何分组第二个列表，基于gr_list分组

Answer 1

Not the cleanest code, but I managed to do it like that:不是最干净的代码，但我设法做到了：

from collections import defaultdict

d = defaultdict(list)
for i, item in enumerate(rounded): # rounded is rounded preds
    d[item].append(target[i])

meanDict = {}
for k,v in d.items():
    meanDict[k] = sum(v)/ float(len(v))

preds, target = zip(*avgDict.items())

mse = mean_squared_error(values, keys)

Answer 2

Here is a reproducible example of a more idiomatic way to do what you are trying to achieve, if I understand correctly:如果我理解正确，这是一个更惯用的方法来实现您想要实现的目标的可重复示例：

import random
import pandas as pd

preds = [random.random() for _ in range(1_000)]
target = [random.randint(0, 1) for _ in range(1_000)]

df = pd.DataFrame({"preds": preds, "target": target})

import numpy as np

# Steps 1 to 4 of your post
df = df.round({"preds": 2}).groupby("preds").agg(np.mean).reset_index()

print(df)
# Output
     preds    target
0     0.00  1.000000
1     0.01  0.555556
2     0.02  0.375000
3     0.03  0.375000
4     0.04  0.416667
..     ...       ...
96    0.96  0.666667
97    0.97  0.500000
98    0.98  0.375000
99    0.99  0.461538
100   1.00  0.285714

from sklearn.metrics import mean_squared_error

# Step 5
print(mean_squared_error(df["preds"], df["target"]))  # 0.1084811098077257

如何对列表进行分组并评估均方误差？

问题描述

2 个解决方案

解决方案1
0 2022-09-24 09:11:33

解决方案2
0 2022-09-25 06:47:49

如何对列表进行分组并评估均方误差？

问题描述

2 个解决方案

解决方案1 0 2022-09-24 09:11:33

解决方案2 0 2022-09-25 06:47:49

解决方案1
0 2022-09-24 09:11:33

解决方案2
0 2022-09-25 06:47:49