如何根据前一列填充 NaN 值

Question

I have an initial column with no missing data (A) but with repeated values.我有一个没有缺失数据（A）但有重复值的初始列。 How do I fill the next column (B) with missing data so that it is filled and the column on the left always has the same value on the right?如何用缺失的数据填充下一列 (B) 以便填充它并且左侧的列在右侧始终具有相同的值？ I would also like any other columns to remain the same (C)我还希望任何其他列保持不变 (C)

For example, this is what I have例如，这就是我所拥有的

    A    B     C
1   1    20    4
2   2    NaN   8
3   3    NaN   2
4   2    30    9
5   3    40    1
6   1    NaN   3

And this is what I want这就是我想要的

    A    B     C
1   1    20    4
2   2    30*   8
3   3    40*   2
4   2    30    9
5   3    40    1
6   1    20*   3

Asterisk on filled values.填充值上的星号。

This needs to be scalable with a very large dataframe.这需要使用非常大的数据帧进行扩展。

Additionally, if I had a value on the left column that has more than one value on the right side on separate observations, how would I fill with the mean?此外，如果我在左列中有一个值，而在单独观察的右侧有多个值，我将如何填充平均值？

Answer 1

You can use groupby on 'A' and use first to find the first corresponding value in 'B' (it will not select NaN ).您可以在'A'上使用groupby并使用first在'B'找到第一个对应的值（它不会选择NaN ）。

import pandas as pd

df = pd.DataFrame({'A':[1,2,3,2,3,1], 
                   'B':[20, None, None, 30, 40, None], 
                   'C': [4,8,2,9,1,3]})

# find first 'B' value for each 'A'
lookup = df[['A', 'B']].groupby('A').first()['B']

# only use rows where 'B' is NaN
nan_mask = df['B'].isnull()

# replace NaN values in 'B' with lookup values
df['B'].loc[nan_mask] = df.loc[nan_mask].apply(lambda x: lookup[x['A']], axis=1)

print(df)

Which outputs:哪些输出：

   A     B  C
0  1  20.0  4
1  2  30.0  8
2  3  40.0  2
3  2  30.0  9
4  3  40.0  1
5  1  20.0  3

If there are many NaN values in 'B' you might want to exclude them before you use groupby .如果'B'有许多NaN值，您可能希望在使用groupby之前排除它们。

import pandas as pd

df = pd.DataFrame({'A':[1,2,3,2,3,1], 
                   'B':[20, None, None, 30, 40, None], 
                   'C': [4,8,2,9,1,3]})

# Only use rows where 'B' is NaN
nan_mask = df['B'].isnull()

# Find first 'B' value for each 'A'
lookup = df[~nan_mask][['A', 'B']].groupby('A').first()['B']

df['B'].loc[nan_mask] = df.loc[nan_mask].apply(lambda x: lookup[x['A']], axis=1)

print(df)

Answer 2

You could do sort_values first then forward fill column B based on column A. The way to implement this will be:您可以先执行 sort_values，然后根据 A 列向前填充 B 列。实现这一点的方法是：

import pandas as pd
import numpy as np

x = {'A':[1,2,3,2,3,1],
     'B':[20,np.nan,np.nan,30,40,np.nan],
     'C':[4,8,2,9,1,3]}

df = pd.DataFrame(x)

#sort_values first, then forward fill based on column B
#this will get the right values for you while maintaing
#the original order of the dataframe
df['B'] = df.sort_values(by=['A','B'])['B'].ffill()
print (df)

Output will be:输出将是：

Original data:原始数据：

   A     B  C
0  1  20.0  4
1  2   NaN  8
2  3   NaN  2
3  2  30.0  9
4  3  40.0  1
5  1   NaN  3

Updated data:更新数据：

   A     B  C
0  1  20.0  4
1  2  30.0  8
2  3  40.0  2
3  2  30.0  9
4  3  40.0  1
5  1  20.0  3

如何根据前一列填充 NaN 值

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-02-12 13:04:23

解决方案2
0 2020-09-03 04:51:57

如何根据前一列填充 NaN 值

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-02-12 13:04:23

解决方案2 0 2020-09-03 04:51:57

解决方案1
2 已采纳 2020-02-12 13:04:23

解决方案2
0 2020-09-03 04:51:57