Apply formula based on condition in certain column

Question

I have a DataFrame that looks like that:

df1=pd.DataFrame([[1,0.10],[1,0.15],[3,0.16],[3,0.11],[3,0.12],[1,0.14],[2,0.17],
                  [2,0.19],[1,0.10]], columns=["a","b"])

result is:

    a   b
0   1   0.10
1   1   0.15
2   3   0.16
3   3   0.11
4   3   0.12
5   1   0.14
6   2   0.17
7   2   0.19
8   1   0.10

I want to create a new column "c" which value is derived from the value in column "b" but if some condition is met in column "a". Better is to see visualization of the expected result. I input the value in column "c" as string so that one better understands the expected manipulation. But in reality what I need is just a number.


    a     b               c
0   1   0.10    --->    (1-0.10)
1   1   0.15    --->    (1-0.15)
2   3   0.16    --->    (1-0.16)x(1-0.11)x(1-0.12)
3   3   0.11    --->    (1-0.16)x(1-0.11)x(1-0.12)
4   3   0.12    --->    (1-0.16)x(1-0.11)x(1-0.12)
5   1   0.14    --->    (1-0.14)
6   2   0.17    --->    (1-0.17)x(1-0.19)
7   2   0.19    --->    (1-0.17)x(1-0.19)
8   1   0.10    --->    (1-0.10)

So, if:

value in "a" is 1: then c = 1-b

if value in "a" is 2: then c = (1-b)x(1-b) for the rows where "a" is 2.

if value in "a" is 3: then c = (1-b)x(1-b)x(1-b) for the rows where "a" is 3

Answer 1

Use custom function with consecutive groups by Series.shift with Series.cumsum and in custom function by GroupBy.apply test if a (here x.name[0] ) is higher like 1 and then also add product to new column c :

def f(x):
    diff = 1-x['b']
    if x.name[0]>1:
        x['c'] = diff.prod()
    else:
        x['c'] = diff
    return x 

g = df['a'].ne(df['a'].shift()).cumsum()
df = df.groupby(['a',g]).apply(f)
print (df)
   a     b         c
0  1  0.10  0.900000
1  1  0.15  0.850000
2  3  0.16  0.657888
3  3  0.11  0.657888
4  3  0.12  0.657888
5  1  0.14  0.860000
6  2  0.17  0.672300
7  2  0.19  0.672300
8  1  0.10  0.900000

Another idea:

df['c'] = df['b'].rsub(1)
g = df['a'].ne(df['a'].shift()).cumsum()
g1 = g.where(df['a'].gt(1).groupby(g).transform('all'))
df['c'] = df.groupby(g1)['c'].transform('prod').fillna(df['c'])
print (df)
   a     b         c
0  1  0.10  0.900000
1  1  0.15  0.850000
2  3  0.16  0.657888
3  3  0.11  0.657888
4  3  0.12  0.657888
5  1  0.14  0.860000
6  2  0.17  0.672300
7  2  0.19  0.672300
8  1  0.10  0.900000

Answer 2

Here's what you could do in pseudocode:

Add a column diff that is 1-b
Use Pandas' groupby to group by a
Use Pandas' product to build the product over diff in each group and store that in column c

Apply formula based on condition in certain column

Question

2 answers

solution1
1 ACCPTED 2020-02-02 09:36:37

solution2
0 2020-02-02 09:34:22

Apply formula based on condition in certain column

Question

2 answers

solution1 1 ACCPTED 2020-02-02 09:36:37

solution2 0 2020-02-02 09:34:22

solution1
1 ACCPTED 2020-02-02 09:36:37

solution2
0 2020-02-02 09:34:22