[英]Python pandas replace NaN values of one column(A) by mode (of same column -A) with respect to another column in pandas dataframe
Here is the dataframe with some NaN
values,这是带有一些
NaN
值的数据框,
data = {'Number':[100,None,None,200,150,None,100,120,110,210,120],
'Street':['A','B','C','D','C','D','A','B','B','D','B']}
df =pd.DataFrame(data)
df
Output:输出:
Number Street
0 100.0 A
1 NaN B
2 NaN C
3 200.0 D
4 150.0 C
5 NaN D
6 100.0 A
7 120.0 B
8 110.0 B
9 210.0 D
10 120.0 B
I want to replace the NaN
values of the column 'Number' by the mode of the same column with respect to the column 'Street' .我想用与列'Street' 相关的同一列的模式替换列'Number'的
NaN
值。
The output I need is:我需要的输出是:
Number Street
0 100 A
1 120 B
2 150 C
3 200 D
4 150 C
5 200 D
6 100 A
7 120 B
8 110 B
9 210 D
10 120 B
Explanation:解释:
For example, consider row 1 which has a NaN
value in the column Number and the corresponding Street is B
.例如,考虑第 1 行,它在Number列中有一个
NaN
值,对应的Street是B
。 The NaN
value of Number should be replaced by 120.0
which is the mode of the column Number with respect to Street . Number的
NaN
值应替换为120.0
,这是Number列相对于Street 的模式。 Because, the other values for the column Number for Street B
are 120.0, 110.0
and 120.0
(look at row nos 7,8,10), and the mode for this is 120.0
.因为,街道
B
Number列的其他值是120.0, 110.0
和120.0
(查看第 7、8、10 行),并且其模式为120.0
。
Use GroupBy.transform
with lambda function for return first mode
and replace missing values by Series.fillna
:使用
GroupBy.transform
和 lambda 函数返回优先mode
并用Series.fillna
替换缺失值:
f = lambda x: x.mode().iat[0]
df['Number'] = df['Number'].fillna(df.groupby('Street')['Number'].transform(f))
Or:或者:
f = lambda x: fillna(x.mode().iat[0])
df['Number'] = df.groupby('Street')['Number'].transform(f)
print (df)
Number Street
0 100.0 A
1 120.0 B
2 150.0 C
3 200.0 D
4 150.0 C
5 200.0 D
6 100.0 A
7 120.0 B
8 110.0 B
9 210.0 D
10 120.0 B
Here is possible error if some group has only NaN/None
s:如果某些组只有
NaN/None
s,则可能出现错误:
IndexError: index 0 is out of bounds for axis 0 with size 0
索引错误:索引 0 超出轴 0 的范围,大小为 0
Then solution is:那么解决办法是:
data = {'Number':[None,None,None,200,150,None,None,120,110,210,120],
'Street':['A','B','C','D','C','D','A','B','B','D','B']}
df =pd.DataFrame(data)
print (df)
Number Street
0 NaN A
1 NaN B
2 NaN C
3 200.0 D
4 150.0 C
5 NaN D
6 NaN A
7 120.0 B
8 110.0 B
9 210.0 D
10 120.0 B
f = lambda x: x.mode().iat[0] if x.notna().any() else np.nan
df['Number'] = df['Number'].fillna(df.groupby('Street')['Number'].transform(f))
print (df)
Number Street
0 NaN A
1 120.0 B
2 150.0 C
3 200.0 D
4 150.0 C
5 200.0 D
6 NaN A
7 120.0 B
8 110.0 B
9 210.0 D
10 120.0 B
Maybe a bit simpler, as mode
returns an array So, you can impute Nan values within Number
by grabbing the first element of it to get the results.也许更简单一点,因为
mode
返回一个数组所以,您可以通过获取Number
的第一个元素来估算Number
的 Nan 值以获得结果。
>>> df['Number'] = df.groupby('Street')['Number'].apply(lambda x: x.fillna(x.mode()[0]))
# df['Number'] = df.groupby('Street').transform(lambda x: x.fillna(x.mode()[0]))
>>> df
Number Street
0 100.0 A
1 120.0 B
2 150.0 C
3 200.0 D
4 150.0 C
5 200.0 D
6 100.0 A
7 120.0 B
8 110.0 B
9 210.0 D
10 120.0 B
You can see the another solution here based on the loc
and first_valid_index您可以在此处查看基于
loc
和first_valid_index的另一个解决方案
df['Number'] = df.groupby('Street')['Number'].transform(lambda s: s.loc[s.first_valid_index()])
OR或者
df.assign(Number=df.groupby(['Street']).Number.apply(lambda x: x.fillna(x.mode()[0])))
or
df.assign(Number=df.groupby(['Street']).transform(lambda x: x.fillna(x.mode()[0])))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.