简体   繁体   中英

Creating a new column based on adjusting weights from other columns

I have a dataframe (df) with 10 columns. The index has many different dates, however there are multiple identical dates (and it is sorted by date). Additionally the important columns for this problem are df['Weight'] and df['Test'].

Here is an example of the data with 2 columns for only 1 index value (1/21/2017), in reality there are multiple dates with multiple weights etc.

          Weight    Test
1/21/2017   0.1     NaN
1/21/2017   0.04    0.04
1/21/2017   0.03    Nan
1/21/2017   0.02    Nan
1/21/2017   0.2     0.2
1/21/2017   0.001   Nan
1/21/2017   0.1     0.1
1/21/2017   0.21    0.21
1/21/2017   0.003   Nan
1/21/2017   0.01    0.01
1/21/2017   0.04    0.04
1/21/2017   0.005   Nan
1/21/2017   0.05    0.05
1/21/2017   0.1      Nan
1/21/2017   0.091   Nan

The df['Weight'] adds up to 1 for a particular index, and this is true for every unique date of the index.

I have created a Test column which only shows a weight if a condition is satisfied.

Now I am trying to create a column df['adjusted weight'] which will look at the Test column and if there a Nan it would multiply the weight which is in df['Weight'] by 0.75 and assign that to df['adjusted_weight'], and then the rest of the the entries where df['Test'] is not nan for the a particular date the df['Test'] weights should adjusted upwards pro rata and assigned to df['adjusted weight'] so the sum of df['adjusted weight'] for any date =1.

I would like it to be flexible so I can also multiply the Weights by 0.5 and 0.75 and pro rata done for the rest etc.

Thanks all so much for the help and support.

Best wishes.

def bool_scale(df, col, cond, scale):
    cond = df[cond].notnull().values
    v = df.values
    i = df.columns.get_loc(col)
    w = v[:, i]
    w_up = w[cond].sum()
    return df.assign(
        adjusted_weight=np.where(
            cond, w * scale, w / (1 - w_up) * (1 - scale * w_up)))

bool_scale(df, 'Weight', 'Test', .75)

           Weight  Test  adjusted_weight
1/21/2017   0.100   NaN         0.146429
1/21/2017   0.040  0.04         0.030000
1/21/2017   0.030   NaN         0.043929
1/21/2017   0.020   NaN         0.029286
1/21/2017   0.200  0.20         0.150000
1/21/2017   0.001   NaN         0.001464
1/21/2017   0.100  0.10         0.075000
1/21/2017   0.210  0.21         0.157500
1/21/2017   0.003   NaN         0.004393
1/21/2017   0.010  0.01         0.007500
1/21/2017   0.040  0.04         0.030000
1/21/2017   0.005   NaN         0.007321
1/21/2017   0.050  0.05         0.037500
1/21/2017   0.100   NaN         0.146429
1/21/2017   0.091   NaN         0.133250

You can apply it in a groupby

kws = dict(col='Weight', cond='Test', scale=.75)
df.groupby(level=0).apply(bool_scale, **kws) 

                     Weight  Test  adjusted_weight
1/21/2017 1/21/2017   0.100   NaN         0.146429
          1/21/2017   0.040  0.04         0.030000
          1/21/2017   0.030   NaN         0.043929
          1/21/2017   0.020   NaN         0.029286
          1/21/2017   0.200  0.20         0.150000
          1/21/2017   0.001   NaN         0.001464
          1/21/2017   0.100  0.10         0.075000
          1/21/2017   0.210  0.21         0.157500
          1/21/2017   0.003   NaN         0.004393
          1/21/2017   0.010  0.01         0.007500
          1/21/2017   0.040  0.04         0.030000
          1/21/2017   0.005   NaN         0.007321
          1/21/2017   0.050  0.05         0.037500
          1/21/2017   0.100   NaN         0.146429
          1/21/2017   0.091   NaN         0.133250

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM