简体   繁体   中英

Apply formula based on condition in certain column

I have a DataFrame that looks like that:

df1=pd.DataFrame([[1,0.10],[1,0.15],[3,0.16],[3,0.11],[3,0.12],[1,0.14],[2,0.17],
                  [2,0.19],[1,0.10]], columns=["a","b"])

result is:

    a   b
0   1   0.10
1   1   0.15
2   3   0.16
3   3   0.11
4   3   0.12
5   1   0.14
6   2   0.17
7   2   0.19
8   1   0.10

I want to create a new column "c" which value is derived from the value in column "b" but if some condition is met in column "a". Better is to see visualization of the expected result. I input the value in column "c" as string so that one better understands the expected manipulation. But in reality what I need is just a number.


    a     b               c
0   1   0.10    --->    (1-0.10)
1   1   0.15    --->    (1-0.15)
2   3   0.16    --->    (1-0.16)x(1-0.11)x(1-0.12)
3   3   0.11    --->    (1-0.16)x(1-0.11)x(1-0.12)
4   3   0.12    --->    (1-0.16)x(1-0.11)x(1-0.12)
5   1   0.14    --->    (1-0.14)
6   2   0.17    --->    (1-0.17)x(1-0.19)
7   2   0.19    --->    (1-0.17)x(1-0.19)
8   1   0.10    --->    (1-0.10)

So, if:

value in "a" is 1: then c = 1-b

if value in "a" is 2: then c = (1-b)x(1-b) for the rows where "a" is 2.

if value in "a" is 3: then c = (1-b)x(1-b)x(1-b) for the rows where "a" is 3

Use custom function with consecutive groups by Series.shift with Series.cumsum and in custom function by GroupBy.apply test if a (here x.name[0] ) is higher like 1 and then also add product to new column c :

def f(x):
    diff = 1-x['b']
    if x.name[0]>1:
        x['c'] = diff.prod()
    else:
        x['c'] = diff
    return x 

g = df['a'].ne(df['a'].shift()).cumsum()
df = df.groupby(['a',g]).apply(f)
print (df)
   a     b         c
0  1  0.10  0.900000
1  1  0.15  0.850000
2  3  0.16  0.657888
3  3  0.11  0.657888
4  3  0.12  0.657888
5  1  0.14  0.860000
6  2  0.17  0.672300
7  2  0.19  0.672300
8  1  0.10  0.900000

Another idea:

df['c'] = df['b'].rsub(1)
g = df['a'].ne(df['a'].shift()).cumsum()
g1 = g.where(df['a'].gt(1).groupby(g).transform('all'))
df['c'] = df.groupby(g1)['c'].transform('prod').fillna(df['c'])
print (df)
   a     b         c
0  1  0.10  0.900000
1  1  0.15  0.850000
2  3  0.16  0.657888
3  3  0.11  0.657888
4  3  0.12  0.657888
5  1  0.14  0.860000
6  2  0.17  0.672300
7  2  0.19  0.672300
8  1  0.10  0.900000

Here's what you could do in pseudocode:

  1. Add a column diff that is 1-b
  2. Use Pandas' groupby to group by a
  3. Use Pandas' product to build the product over diff in each group and store that in column c

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM