简体   繁体   English

如何对列表进行分组并评估均方误差?

[英]How to group lists and evaluate mean square error?

I'm writing custom metric function and here's the steps I implemented:我正在编写自定义指标 function,这是我实施的步骤:

  1. I have a list of floats in preds and list of int 0-1 values in target我有一个preds中的浮点列表和target中的int 0-1值列表
  2. I round preds我圆preds
  3. I need to make groupby on preds我需要在preds上进行groupby
  4. Count mean target values for those groupedby preds计算那些按preds的平均target
  5. Count MSE between groupedby preds and target计算 groupedby predstarget之间的MSE

That's how df looks like before groupby这就是dfgroupby之前的样子

在此处输入图像描述

rounded = [np.round(x, 2) for x in preds]

df = pd.DataFrame({'target': target, 'preds': rounded})
        
df = df.groupby('preds')['target'].mean().to_frame().reset_index()
        
mse = mean_squared_error(df['target'], df['preds'])  

And that's how after groupby and mean() (as I can't properly display groupby )这就是groupbymean()之后的方式(因为我无法正确显示groupby

在此处输入图像描述

Basicaly, I don't know how to groupby on two python list s.基本上,我不知道如何在两个 python list上进行分组。

I did groupby on one list like that我在这样的一个列表上做了 groupby

gr_list = [list(j) for i, j in groupby(rounded)]

But I have no clue how to groupby second list, based on gr_list groupping但我不知道如何分组第二个列表,基于gr_list分组

Not the cleanest code, but I managed to do it like that:不是最干净的代码,但我设法做到了:

from collections import defaultdict

d = defaultdict(list)
for i, item in enumerate(rounded): # rounded is rounded preds
    d[item].append(target[i])

在此处输入图像描述

meanDict = {}
for k,v in d.items():
    meanDict[k] = sum(v)/ float(len(v))

在此处输入图像描述

preds, target = zip(*avgDict.items())

mse = mean_squared_error(values, keys)

Here is a reproducible example of a more idiomatic way to do what you are trying to achieve, if I understand correctly:如果我理解正确,这是一个更惯用的方法来实现您想要实现的目标的可重复示例:

import random
import pandas as pd

preds = [random.random() for _ in range(1_000)]
target = [random.randint(0, 1) for _ in range(1_000)]

df = pd.DataFrame({"preds": preds, "target": target})
import numpy as np

# Steps 1 to 4 of your post
df = df.round({"preds": 2}).groupby("preds").agg(np.mean).reset_index()

print(df)
# Output
     preds    target
0     0.00  1.000000
1     0.01  0.555556
2     0.02  0.375000
3     0.03  0.375000
4     0.04  0.416667
..     ...       ...
96    0.96  0.666667
97    0.97  0.500000
98    0.98  0.375000
99    0.99  0.461538
100   1.00  0.285714
from sklearn.metrics import mean_squared_error

# Step 5
print(mean_squared_error(df["preds"], df["target"]))  # 0.1084811098077257

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM