简体   繁体   中英

How to deal with NAN values coming from pct_change when another column value is different than previous row

I have a dataframe (df) like so:

Year |  Name  |  Count
2017   John       1
2018   John       2
2019   John       3
2017   Fred       1
2018   Fred       2
2019   Fred       3

Applying the below code, gives me NAN values, how to convert those NAN values into average percentage change based on the values for that group, for example average coming out of 1.0 and 0.5 for John, ie its specific NAN to be replaced with 0.75 = ((1.0+0.5)/2).

df['pct_chg']=df.groupby([df.Name.ne(df.Name.shift()).cumsum(),'Name'])['Count'].\
                                                   apply(lambda x: x.pct_change())
print(df)

   Year  Name  Count  pct_chg
0  2017  John      1      NaN
1  2018  John      2      1.0
2  2019  John      3      0.5
3  2017  Fred      1      NaN
4  2018  Fred      2      1.0
5  2019  Fred      3      0.5

Just creating the new column containing the average value of each group with the example below

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'group': [1,1,1,2,2,2],
    'value': [None, 0.5, 1, None, 0.75, 0.25]
})
df['avg_value'] = df.groupby('group').transform(lambda x: x.mean())

Then, apply np.where function to fill value by condition ( If the value column is null, then fill with avg_value, else using the value column)

df['value'] = np.where(
    df['value'].isna(),
    df['avg_value'],
    df['value']
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM