将数据从一列回填到另一列

Question

这是我的数据的样子：

colA  colB
 a      1
 a      1
 c      2
 c      2
Nan     1 
 c      1
 a      2
Nan     2

我想在colA中填写Nans。 结果应如下所示：

由于colB = 1且colB中的总体1被映射到colA中的c而不是c，所以第5行被'a'填充

由于colB = 2且colB中的整体2映射到colA中的a而不是a中的c，所以第8行被'c'填充

Answer 1

您可以在组上使用模式（忽略平局）：

In [11]: df
Out[11]:
  colA  colB
0    a     1
1    a     1
2    c     2
3    c     2
4  NaN     1
5    c     1
6    a     2
7  NaN     2

In [12]: modes = df.groupby('colB')['colA'].transform(lambda x: x.mode().iloc[0])

In [13]: modes
Out[13]:
0    a
1    a
2    c
3    c
4    a
5    a
6    c
7    c
Name: colA, dtype: object

使用fillna替换仅适用于NaN的模式：

In [14]: df['colA'].fillna(modes)
Out[14]:
0    a
1    a
2    c
3    c
4    a
5    c
6    a
7    c
Name: colA, dtype: object

In [15]: df['colA'] = df['colA'].fillna(modes)

注意：阅读文档，如果至少一次没有任何项目出现，则会引发此问题，因此您可能希望在转换中使用更强大的功能：

def mymode(s):
    try:
        return s.mode().iloc[0]
    except IndexError:
        # just pick the first element, even though it occurs only once, even if it's NaN
        return s.iloc[0] if len(s) >= 1 else np.nan

将数据从一列回填到另一列

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-03-18 07:56:25

将数据从一列回填到另一列

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-03-18 07:56:25

解决方案1
1 已采纳 2015-03-18 07:56:25