简体   繁体   English

Python pandas 用模式(同一列 -A)相对于 Pandas 数据帧中的另一列替换一列(A)的 NaN 值

[英]Python pandas replace NaN values of one column(A) by mode (of same column -A) with respect to another column in pandas dataframe

Here is the dataframe with some NaN values,这是带有一些NaN值的数据框,

data = {'Number':[100,None,None,200,150,None,100,120,110,210,120],
    'Street':['A','B','C','D','C','D','A','B','B','D','B']}
df =pd.DataFrame(data)
df

Output:输出:

    Number  Street
0   100.0   A
1   NaN     B
2   NaN     C
3   200.0   D
4   150.0   C
5   NaN     D
6   100.0   A
7   120.0   B
8   110.0   B
9   210.0   D
10  120.0   B

I want to replace the NaN values of the column 'Number' by the mode of the same column with respect to the column 'Street' .我想用与列'Street' 相关的同一列的模式替换列'Number'NaN值。

The output I need is:我需要的输出是:

    Number  Street
0   100       A
1   120       B
2   150       C
3   200       D
4   150       C
5   200       D
6   100       A
7   120       B
8   110       B
9   210       D
10  120       B

Explanation:解释:

For example, consider row 1 which has a NaN value in the column Number and the corresponding Street is B .例如,考虑第 1 行,它在Number列中有一个NaN值,对应的StreetB The NaN value of Number should be replaced by 120.0 which is the mode of the column Number with respect to Street . NumberNaN值应替换为120.0 ,这是Number列相对于Street 的模式 Because, the other values for the column Number for Street B are 120.0, 110.0 and 120.0 (look at row nos 7,8,10), and the mode for this is 120.0 .因为,街道B Number列的其他值是120.0, 110.0120.0 (查看第 7、8、10 行),并且其模式为120.0

Use GroupBy.transform with lambda function for return first mode and replace missing values by Series.fillna :使用GroupBy.transform和 lambda 函数返回优先mode并用Series.fillna替换缺失值:

f = lambda x: x.mode().iat[0]
df['Number'] = df['Number'].fillna(df.groupby('Street')['Number'].transform(f))

Or:或者:

f = lambda x: fillna(x.mode().iat[0])
df['Number'] = df.groupby('Street')['Number'].transform(f)

print (df)
    Number Street
0    100.0      A
1    120.0      B
2    150.0      C
3    200.0      D
4    150.0      C
5    200.0      D
6    100.0      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B

Here is possible error if some group has only NaN/None s:如果某些组只有NaN/None s,则可能出现错误:

IndexError: index 0 is out of bounds for axis 0 with size 0索引错误:索引 0 超出轴 0 的范围,大小为 0

Then solution is:那么解决办法是:

data = {'Number':[None,None,None,200,150,None,None,120,110,210,120],
    'Street':['A','B','C','D','C','D','A','B','B','D','B']}
df =pd.DataFrame(data)
print (df)
    Number Street
0      NaN      A
1      NaN      B
2      NaN      C
3    200.0      D
4    150.0      C
5      NaN      D
6      NaN      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B

f = lambda x: x.mode().iat[0] if x.notna().any() else np.nan
df['Number'] = df['Number'].fillna(df.groupby('Street')['Number'].transform(f))
print (df)
    Number Street
0      NaN      A
1    120.0      B
2    150.0      C
3    200.0      D
4    150.0      C
5    200.0      D
6      NaN      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B

Maybe a bit simpler, as mode returns an array So, you can impute Nan values within Number by grabbing the first element of it to get the results.也许更简单一点,因为mode返回一个数组所以,您可以通过获取Number的第一个元素来估算Number的 Nan 值以获得结果。

Solution 1:解决方案1:

>>> df['Number'] = df.groupby('Street')['Number'].apply(lambda x: x.fillna(x.mode()[0]))
    # df['Number'] = df.groupby('Street').transform(lambda x: x.fillna(x.mode()[0]))
>>> df
    Number Street
0    100.0      A
1    120.0      B
2    150.0      C
3    200.0      D
4    150.0      C
5    200.0      D
6    100.0      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B

Solution 2:解决方案2:

You can see the another solution here based on the loc and first_valid_index您可以在此处查看基于locfirst_valid_index的另一个解决方案

df['Number'] = df.groupby('Street')['Number'].transform(lambda s: s.loc[s.first_valid_index()])

OR或者

df.assign(Number=df.groupby(['Street']).Number.apply(lambda x: x.fillna(x.mode()[0])))

  or

df.assign(Number=df.groupby(['Street']).transform(lambda x: x.fillna(x.mode()[0])))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas - 使用 interpolate() 替换一列相对于另一列的 NaN 值 - Python Pandas - Replace NaN values of a column with respect to another column using interpolate() Python Pandas 将一列中的 NaN 替换为与列表列相同行的另一列中的值 - Python Pandas replace NaN in one column with value from another column of the same row it has be as list column 将某些 pandas dataframe 列值从一列移到另一列,并将旧的 position 替换为 Nan - Move certain pandas dataframe column values from one column to another and replace old position with Nan 用另一列 Pandas DataFrame 替换一列中的值 - Replace values from one column with another column Pandas DataFrame 熊猫:如果数据框列为“ NaN”,则替换该列 - Pandas: Replace dataframe column if it is `NaN` Python Pandas:使用相同的类别名称(在一列中)和NaN(在另一列中)填充数据框 - Python Pandas: Fill dataframe with the same category name (in one column) and NaN (in another) Pandas:如何根据另一列替换列中的 Nan 值? - Pandas: How to replace values of Nan in column based on another column? 根据另一列中的“NaN”值在 Pandas Dataframe 中创建一个新列 - Create a new column in Pandas Dataframe based on the 'NaN' values in another column Pandas 将列的值替换为与另一个 Dataframe 的比较 - Pandas replace values of a column with comparison to another Dataframe 将 Pandas Dataframe 中的一列替换为另一列 - Replace one column with another in Pandas Dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM