熊猫用列值填充NaN

Question

给定以下数据框：

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[1,1,np.nan],
                   'B':[2.2,np.nan,2.2]})           
df

    A       B
0   1.0     2.2
1   1.0     NaN
2   NaN     2.2

如果我想用在该列（1）中重复的值替换A列中的NaN值，并对B列执行相同的操作，我需要使用哪种fillna（）？

    A       B
0   1.0     2.2
1   1.0     NaN
2   NaN     2.2

寻找通用解决方案，因为我确实有数千行。 提前致谢！

Answer 1

fillna可以采用值的字典，其中键是列名。

假设您要用重复次数最多的值填充列，则可以使用以下方法计算字典：

df = pd.DataFrame({
    'A': [1, 1, np.nan, 2], 
    'B': [2.2, np.nan, 2.2, 1.9]
})
fill_dict = df.mode().to_dict(orient='records')[0]
df = df.fillna(values=fill_dict)
df

   A    B
0  1  2.2
1  1  2.2
2  1  2.2
3  2  1.9

Answer 2

为什么不简单：

df.fillna(method='ffill')

# df = pd.DataFrame({'A': [1, 1, np.nan, 2], 'B': [2.2, np.nan, 2.2, 1.9]})
# df.fillna(method='ffill')
#   A    B
#0  1  2.2
#1  1  2.2
#2  1  2.2
#3  2  1.9

Answer 3

import itertools
import operator

def most_common(L):
  # get an iterable of (item, iterable) pairs
  SL = sorted((x, i) for i, x in enumerate(L))
  # print 'SL:', SL
  groups = itertools.groupby(SL, key=operator.itemgetter(0))
  # auxiliary function to get "quality" for an item
  def _auxfun(g):
    item, iterable = g
    count = 0
    min_index = len(L)
    for _, where in iterable:
      count += 1
      min_index = min(min_index, where)
    # print 'item %r, count %r, minind %r' % (item, count, min_index)
    return count, -min_index
  # pick the highest-count/earliest item
  return max(groups, key=_auxfun)[0]

然后只需添加

df['A'].fillna(most_common(df['A'].values.tolist()))

熊猫用列值填充NaN

问题描述

3 个解决方案

解决方案1
2 2016-03-23 05:07:01

解决方案2
2 已采纳 2016-03-23 08:03:16

解决方案3
0 2016-03-23 05:14:13

熊猫用列值填充NaN

问题描述

3 个解决方案

解决方案1 2 2016-03-23 05:07:01

解决方案2 2 已采纳 2016-03-23 08:03:16

解决方案3 0 2016-03-23 05:14:13

解决方案1
2 2016-03-23 05:07:01

解决方案2
2 已采纳 2016-03-23 08:03:16

解决方案3
0 2016-03-23 05:14:13