Fill values of a column based on mean of another column

Question

I have a pandas DataFrame. I'm trying to fill the nans of the Price column based on the average price of the corresponding level in the Section column. What's an efficient and elegant way to do this? My data looks something like this

Name   Sex  Section  Price
Joe     M      1       2
Bob     M      1       nan
Nancy   F      2       5
Grace   F      1       6
Jen     F      2       3
Paul    M      2       nan

Answer 1

You could use combine groupby , transform , and mean . Note that I've modified your example because otherwise both Sections have the same mean value. Starting from

In [21]: df
Out[21]: 
    Name Sex  Section  Price
0    Joe   M        1    2.0
1    Bob   M        1    NaN
2  Nancy   F        2    5.0
3  Grace   F        1    6.0
4    Jen   F        2   10.0
5   Paul   M        2    NaN

we can use

df["Price"] = (df["Price"].fillna(df.groupby("Section")["Price"].transform("mean"))

to produce

In [23]: df
Out[23]: 
    Name Sex  Section  Price
0    Joe   M        1    2.0
1    Bob   M        1    4.0
2  Nancy   F        2    5.0
3  Grace   F        1    6.0
4    Jen   F        2   10.0
5   Paul   M        2    7.5

This works because we can compute the mean by Section:

In [29]: df.groupby("Section")["Price"].mean()
Out[29]: 
Section
1    4.0
2    7.5
Name: Price, dtype: float64

and broadcast this back up to a full Series we can pass to fillna() using transform :

In [30]: df.groupby("Section")["Price"].transform("mean")
Out[30]: 
0    4.0
1    4.0
2    7.5
3    4.0
4    7.5
5    7.5
Name: Price, dtype: float64

Answer 2

`pandas` surgical but slower

Refer to @DSM's answer for a quicker `pandas` solution

This is a more surgical approach that may provide some perspective, possibly usefull

use groupyby

calculate our mean for each Section

 means = df.groupby('Section').Price.mean()

identify nulls
- use isnull to use for boolean slicing
```
 nulls = df.Price.isnull() 
```
use map
- slice the Section column to limit to just those rows with null Price
```
 fills = df.Section[nulls].map(means) 
```
use loc
- fill in the spots in df only where nulls are
```
 df.loc[nulls, 'Price'] = fills 
```

All together

means = df.groupby('Section').Price.mean()
nulls = df.Price.isnull()
fills = df.Section[nulls].map(means)
df.loc[nulls, 'Price'] = fills

print(df)

    Name Sex  Section  Price
0    Joe   M        1    2.0
1    Bob   M        1    4.0
2  Nancy   F        2    5.0
3  Grace   F        1    6.0
4    Jen   F        2   10.0
5   Paul   M        2    7.5

Answer 3

by "corresponding level" i am assuming you mean with equal section value.

if so, you can solve this by

for section_value in sorted(set(df.Section)):

    df.loc[df['Section']==section_value, 'Price'] = df.loc[df['Section']==section_value, 'Price'].fillna(df.loc[df['Section']==section_value, 'Price'].mean())

hope it helps! peace

Fill values of a column based on mean of another column

Question

3 answers

solution1
5 ACCPTED 2017-01-26 18:45:50

solution2
1 2017-01-26 20:00:30

`pandas` surgical but slower

Refer to @DSM's answer for a quicker `pandas` solution

All together

solution3
0 2017-01-26 18:33:43

Fill values of a column based on mean of another column

Question

3 answers

solution1 5 ACCPTED 2017-01-26 18:45:50

solution2 1 2017-01-26 20:00:30

pandas surgical but slower

Refer to @DSM's answer for a quicker pandas solution

All together

solution3 0 2017-01-26 18:33:43

solution1
5 ACCPTED 2017-01-26 18:45:50

solution2
1 2017-01-26 20:00:30

`pandas` surgical but slower

Refer to @DSM's answer for a quicker `pandas` solution

solution3
0 2017-01-26 18:33:43