Given the following data frame:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[1,1,np.nan],
'B':[2.2,np.nan,2.2]})
df
A B
0 1.0 2.2
1 1.0 NaN
2 NaN 2.2
If I want to replace the NaN value in column A with the value that repeats in that column (1) and do the same for column B, what sort of fillna() do I need to use?
A B
0 1.0 2.2
1 1.0 NaN
2 NaN 2.2
Looking for a generic solution as I really have thousands of rows. Thanks in advance!
fillna
can take dictionary of values where the key is the column name.
Assuming you want to fill the columns with the value that is repeated the most, you can compute the dictionary with:
df = pd.DataFrame({
'A': [1, 1, np.nan, 2],
'B': [2.2, np.nan, 2.2, 1.9]
})
fill_dict = df.mode().to_dict(orient='records')[0]
df = df.fillna(values=fill_dict)
df
A B
0 1 2.2
1 1 2.2
2 1 2.2
3 2 1.9
Why not simply:
df.fillna(method='ffill')
# df = pd.DataFrame({'A': [1, 1, np.nan, 2], 'B': [2.2, np.nan, 2.2, 1.9]})
# df.fillna(method='ffill')
# A B
#0 1 2.2
#1 1 2.2
#2 1 2.2
#3 2 1.9
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
and then just add
df['A'].fillna(most_common(df['A'].values.tolist()))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.