Pandas - Replace NaNs in a column with the mean of specific group

Question

I am working with data like the following. The dataframe is sorted by the date:

category  value  Date
0         1      24/5/2019     
1         NaN    24/5/2019         
1         1      26/5/2019       
2         2      1/6/2019      
1         2      23/7/2019       
2         NaN    18/8/2019         
2         3      20/8/2019       
7         3      1/9/2019 
1         NaN    12/9/2019       
2         NaN      13/9/2019

I would like to replace the "NaN" values with the previous mean for that specific category.

What is the best way to do this in pandas?

Some approaches I considered:

1) This litte riff:

   df['mean' = df.groupby('category')['time'].apply(lambda x: x.shift().expanding().mean()))

source

This gets me the the correct means in but in another column, and it does not replace the NaNs.

2) This riff replaces the NaNs with the average of the columns:

df = df.groupby(df.columns, axis = 1).transform(lambda x: x.fillna(x.mean()))

Source 2

Both of these do not exactly give what I want. If someone could guide me on this it would be much appreciated!

Answer 1

You can replace value by new Series from shift + expanding + mean , first value of 1 group is not replaced, because no previous NaN values exits:

df['Date'] = pd.to_datetime(df['Date'])
s = df.groupby('category')['value'].apply(lambda x: x.shift().expanding().mean())
df['value'] = df['value'].fillna(s)
print (df)
   category  value       Date
0         0    1.0 2019-05-24
1         1    NaN 2019-05-24
2         1    1.0 2019-05-26
3         2    2.0 2019-01-06
4         1    2.0 2019-07-23
5         2    2.0 2019-08-18
6         2    3.0 2019-08-20
7         7    3.0 2019-01-09
8         1    1.5 2019-12-09
9         2    2.5 2019-09-13

Answer 2

You can use pandas.Series.fillna to replace NaN values:

df['value']=df['value'].fillna(df.groupby('category')['value'].transform(lambda x: x.shift().expanding().mean()))
print(df)

   category  value       Date
0         0    1.0  24/5/2019
1         1    NaN  24/5/2019
2         1    1.0  26/5/2019
3         2    2.0   1/6/2019
4         1    2.0  23/7/2019
5         2    2.0  18/8/2019
6         2    3.0  20/8/2019
7         7    3.0   1/9/2019
8         1    1.5  12/9/2019
9         2    2.5  13/9/2019

Pandas - Replace NaNs in a column with the mean of specific group

Question

2 answers

solution1
4 2019-09-15 11:42:13

solution2
1 2019-09-15 11:42:49

Pandas - Replace NaNs in a column with the mean of specific group

Question

2 answers

solution1 4 2019-09-15 11:42:13

solution2 1 2019-09-15 11:42:49

solution1
4 2019-09-15 11:42:13

solution2
1 2019-09-15 11:42:49