用 pandas dataframe 中同一列上的不同值替换缺失值

Question

     A      B     C   D
1  2010    one    0   0
2  2020    one    2   4
3  2007    two    0   8
4  2010    one    8   4
5  2020    four   6  12
6  2007    three  0  14
7  2006    four   7  14
8  2010    two    10 12

I need to replace 0 with the average of the C values of years.For example 2010 C value would be 9. What is the best way to do this?我需要用 C 年值的平均值替换 0。例如 2010 C 值将是 9。最好的方法是什么？ i have over 10,000 rows.我有超过 10,000 行。

Answer 1

You can use replace to change 0's to np.nan in Column C, and use fillna to map the yearly averages:您可以使用replace将np.nan列中的 0 更改为 np.nan，并将fillna用于map的年度平均值：

df.C.replace({0:np.nan},inplace=True)

df.C.fillna(
    df.A.map(
        df.groupby(df['A']).C.mean()\
            .to_dict()
        ),inplace=True
    )

print(df)

      A      B     C   D
0  2010    one   9.0   0
1  2020    one   2.0   4
2  2007    two   NaN   8
3  2010    one   8.0   4
4  2020   four   6.0  12
5  2007  three   NaN  14
6  2006   four   7.0  14
7  2010    two  10.0  12

2007 is still NaN because we have no values other than 0's in the initial data. 2007 仍然是NaN ，因为我们在初始数据中除了 0 之外没有其他值。

Answer 2

Here is what I think I will do it.这就是我认为我会做的事情。 The code below will be pseudo-code.下面的代码将是伪代码。

1: You find the avg for each year, and put it to a dict. 1：您找到每年的平均值，并将其放入字典中。

my_year_dict = {'2020':xxx,'2021':xxx}

2: Use apply & lambda functions 2：使用apply & lambda函数

df[New C Col] = df[C].apply(lambda x: my_year_dict[x] if x is 0)

Hope it can be a start!希望这可以是一个开始！

用 pandas dataframe 中同一列上的不同值替换缺失值

问题描述

2 个解决方案

解决方案1
0 2021-12-31 18:42:39

解决方案2
-1 2021-12-31 14:06:28

用 pandas dataframe 中同一列上的不同值替换缺失值

问题描述

2 个解决方案

解决方案1 0 2021-12-31 18:42:39

解决方案2 -1 2021-12-31 14:06:28

解决方案1
0 2021-12-31 18:42:39

解决方案2
-1 2021-12-31 14:06:28