I have a dataframe (df) with 10 columns. The index has many different dates, however there are multiple identical dates (and it is sorted by date). Additionally the important columns for this problem are df['Weight'] and df['Test'].
Here is an example of the data with 2 columns for only 1 index value (1/21/2017), in reality there are multiple dates with multiple weights etc.
Weight Test
1/21/2017 0.1 NaN
1/21/2017 0.04 0.04
1/21/2017 0.03 Nan
1/21/2017 0.02 Nan
1/21/2017 0.2 0.2
1/21/2017 0.001 Nan
1/21/2017 0.1 0.1
1/21/2017 0.21 0.21
1/21/2017 0.003 Nan
1/21/2017 0.01 0.01
1/21/2017 0.04 0.04
1/21/2017 0.005 Nan
1/21/2017 0.05 0.05
1/21/2017 0.1 Nan
1/21/2017 0.091 Nan
The df['Weight'] adds up to 1 for a particular index, and this is true for every unique date of the index.
I have created a Test column which only shows a weight if a condition is satisfied.
Now I am trying to create a column df['adjusted weight'] which will look at the Test column and if there a Nan it would multiply the weight which is in df['Weight'] by 0.75 and assign that to df['adjusted_weight'], and then the rest of the the entries where df['Test'] is not nan for the a particular date the df['Test'] weights should adjusted upwards pro rata and assigned to df['adjusted weight'] so the sum of df['adjusted weight'] for any date =1.
I would like it to be flexible so I can also multiply the Weights by 0.5 and 0.75 and pro rata done for the rest etc.
Thanks all so much for the help and support.
Best wishes.
def bool_scale(df, col, cond, scale):
cond = df[cond].notnull().values
v = df.values
i = df.columns.get_loc(col)
w = v[:, i]
w_up = w[cond].sum()
return df.assign(
adjusted_weight=np.where(
cond, w * scale, w / (1 - w_up) * (1 - scale * w_up)))
bool_scale(df, 'Weight', 'Test', .75)
Weight Test adjusted_weight
1/21/2017 0.100 NaN 0.146429
1/21/2017 0.040 0.04 0.030000
1/21/2017 0.030 NaN 0.043929
1/21/2017 0.020 NaN 0.029286
1/21/2017 0.200 0.20 0.150000
1/21/2017 0.001 NaN 0.001464
1/21/2017 0.100 0.10 0.075000
1/21/2017 0.210 0.21 0.157500
1/21/2017 0.003 NaN 0.004393
1/21/2017 0.010 0.01 0.007500
1/21/2017 0.040 0.04 0.030000
1/21/2017 0.005 NaN 0.007321
1/21/2017 0.050 0.05 0.037500
1/21/2017 0.100 NaN 0.146429
1/21/2017 0.091 NaN 0.133250
You can apply it in a groupby
kws = dict(col='Weight', cond='Test', scale=.75)
df.groupby(level=0).apply(bool_scale, **kws)
Weight Test adjusted_weight
1/21/2017 1/21/2017 0.100 NaN 0.146429
1/21/2017 0.040 0.04 0.030000
1/21/2017 0.030 NaN 0.043929
1/21/2017 0.020 NaN 0.029286
1/21/2017 0.200 0.20 0.150000
1/21/2017 0.001 NaN 0.001464
1/21/2017 0.100 0.10 0.075000
1/21/2017 0.210 0.21 0.157500
1/21/2017 0.003 NaN 0.004393
1/21/2017 0.010 0.01 0.007500
1/21/2017 0.040 0.04 0.030000
1/21/2017 0.005 NaN 0.007321
1/21/2017 0.050 0.05 0.037500
1/21/2017 0.100 NaN 0.146429
1/21/2017 0.091 NaN 0.133250
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.