简体   繁体   中英

Fill values of a column based on mean of another column

I have a pandas DataFrame. I'm trying to fill the nans of the Price column based on the average price of the corresponding level in the Section column. What's an efficient and elegant way to do this? My data looks something like this

Name   Sex  Section  Price
Joe     M      1       2
Bob     M      1       nan
Nancy   F      2       5
Grace   F      1       6
Jen     F      2       3
Paul    M      2       nan

You could use combine groupby , transform , and mean . Note that I've modified your example because otherwise both Sections have the same mean value. Starting from

In [21]: df
Out[21]: 
    Name Sex  Section  Price
0    Joe   M        1    2.0
1    Bob   M        1    NaN
2  Nancy   F        2    5.0
3  Grace   F        1    6.0
4    Jen   F        2   10.0
5   Paul   M        2    NaN

we can use

df["Price"] = (df["Price"].fillna(df.groupby("Section")["Price"].transform("mean"))

to produce

In [23]: df
Out[23]: 
    Name Sex  Section  Price
0    Joe   M        1    2.0
1    Bob   M        1    4.0
2  Nancy   F        2    5.0
3  Grace   F        1    6.0
4    Jen   F        2   10.0
5   Paul   M        2    7.5

This works because we can compute the mean by Section:

In [29]: df.groupby("Section")["Price"].mean()
Out[29]: 
Section
1    4.0
2    7.5
Name: Price, dtype: float64

and broadcast this back up to a full Series we can pass to fillna() using transform :

In [30]: df.groupby("Section")["Price"].transform("mean")
Out[30]: 
0    4.0
1    4.0
2    7.5
3    4.0
4    7.5
5    7.5
Name: Price, dtype: float64

pandas surgical but slower

Refer to @DSM's answer for a quicker pandas solution

This is a more surgical approach that may provide some perspective, possibly usefull

  • use groupyby

    • calculate our mean for each Section

       means = df.groupby('Section').Price.mean() 
  • identify nulls

    • use isnull to use for boolean slicing

       nulls = df.Price.isnull() 
  • use map

    • slice the Section column to limit to just those rows with null Price

       fills = df.Section[nulls].map(means) 
  • use loc

    • fill in the spots in df only where nulls are

       df.loc[nulls, 'Price'] = fills 

All together

means = df.groupby('Section').Price.mean()
nulls = df.Price.isnull()
fills = df.Section[nulls].map(means)
df.loc[nulls, 'Price'] = fills

print(df)

    Name Sex  Section  Price
0    Joe   M        1    2.0
1    Bob   M        1    4.0
2  Nancy   F        2    5.0
3  Grace   F        1    6.0
4    Jen   F        2   10.0
5   Paul   M        2    7.5

by "corresponding level" i am assuming you mean with equal section value.

if so, you can solve this by

for section_value in sorted(set(df.Section)):

    df.loc[df['Section']==section_value, 'Price'] = df.loc[df['Section']==section_value, 'Price'].fillna(df.loc[df['Section']==section_value, 'Price'].mean())

hope it helps! peace

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM