I am doing some calculations on a dataFrame A
: I want to add a new column RESULT
, and do the following calculation:
There usually are multiple rows with the same key1
values and their key2
can be either X
or Y
. For each group having the same key1
: If key2 = X
, then RESULT = 0
, else, RESULT = (C1 | key2= Y)+ (C2| key2= Y)+ (C2| key2= X)
. See A_MODIFIED
.
A =
key1 key2 C1 C2
0 A X 5 2
1 A Y 3 2
2 B X 6 1
3 B Y 1 3
4 C Y 1 4
5 D X 2 3
6 D Y 1 3
A_MODIFIED =
key1 key2 C1 C2 RESULT
0 A X 5 2 0
1 A Y 3 2 7
2 B X 6 1 0
3 B Y 1 3 5
4 C Y 1 4 5
5 D X 2 3 0
6 D Y 1 3 7
This is what I did:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(A.groupby('key1', sort = False).sum().ix[:, ['C2']].sum(axis=1), columns=['C2_T']).reset_index(level=1)
df2 = A[A['key2'] == 'Y']
df3 = pd.merge(df1, df2, how = 'left').set_index(df1.index)
df3.RESULT = df3.C1+ df3.C2_T
But now I don't know how to merge it with the original A
.
You can apply function f
for each group.
Function f
sum all values of column C2
, because there not depends on value of key2
. Values of C1
depends on key2
, so there are selected only value with df['key2'] == 'Y'
.
Last if df['key2'] == 'X'
output is set to 0
.
print A
# key1 key2 C1 C2
#0 A X 5 2
#1 A Y 3 2
#2 B X 6 1
#3 B Y 1 3
#4 C Y 1 4
#5 D X 2 3
#6 D Y 1 3
def f(df):
df['RESULT'] = df['C2'].sum() + df['C1'].loc[df['key2'] == 'Y'].sum()
df['RESULT'].loc[df['key2'] == 'X'] = 0
return df
df = A.groupby('key1', sort = False).apply(f)
print df
# key1 key2 C1 C2 RESULT
#0 A X 5 2 0
#1 A Y 3 2 7
#2 B X 6 1 0
#3 B Y 1 3 5
#4 C Y 1 4 5
#5 D X 2 3 0
#6 D Y 1 3 7
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.