简体   繁体   English

使用熊猫的条件数据框操作

[英]conditional dataframe operations using Pandas

I am doing some calculations on a dataFrame A : I want to add a new column RESULT , and do the following calculation: 我正在对dataFrame A进行一些计算:我想添加新列RESULT ,并执行以下计算:

There usually are multiple rows with the same key1 values and their key2 can be either X or Y . 通常会有多行具有相同key1值,并且它们的key2可以是XY For each group having the same key1 : If key2 = X , then RESULT = 0 , else, RESULT = (C1 | key2= Y)+ (C2| key2= Y)+ (C2| key2= X) . 对于每个具有相同key1组:如果key2 = X ,则RESULT = 0 ,否则, RESULT = (C1 | key2= Y)+ (C2| key2= Y)+ (C2| key2= X) See A_MODIFIED . 参见A_MODIFIED

    A =
        key1   key2  C1    C2    
    0   A      X     5     2     
    1   A      Y     3     2     
    2   B      X     6     1     
    3   B      Y     1     3     
    4   C      Y     1     4     
    5   D      X     2     3     
    6   D      Y     1     3     

   A_MODIFIED =
       key1   key2  C1    C2    RESULT
   0   A      X     5     2     0
   1   A      Y     3     2     7
   2   B      X     6     1     0
   3   B      Y     1     3     5
   4   C      Y     1     4     5
   5   D      X     2     3     0
   6   D      Y     1     3     7

This is what I did: 这是我所做的:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(A.groupby('key1', sort = False).sum().ix[:, ['C2']].sum(axis=1), columns=['C2_T']).reset_index(level=1)
df2 = A[A['key2'] == 'Y']
df3 = pd.merge(df1, df2, how = 'left').set_index(df1.index)
df3.RESULT = df3.C1+ df3.C2_T

But now I don't know how to merge it with the original A . 但是现在我不知道如何将其与原始A合并。

You can apply function f for each group. 您可以为每个组应用功能f

Function f sum all values of column C2 , because there not depends on value of key2 . 函数fC2列的所有值求和,因为不依赖于key2值。 Values of C1 depends on key2 , so there are selected only value with df['key2'] == 'Y' . C1值取决于key2 ,因此只能选择df['key2'] == 'Y'

Last if df['key2'] == 'X' output is set to 0 . 最后一次,如果df['key2'] == 'X'输出设置为0

print A
#  key1 key2  C1  C2
#0    A    X   5   2
#1    A    Y   3   2
#2    B    X   6   1
#3    B    Y   1   3
#4    C    Y   1   4
#5    D    X   2   3
#6    D    Y   1   3

def f(df):
    df['RESULT'] = df['C2'].sum() + df['C1'].loc[df['key2'] == 'Y'].sum()
    df['RESULT'].loc[df['key2'] == 'X'] = 0
    return df

df = A.groupby('key1', sort = False).apply(f)
print df
#  key1 key2  C1  C2  RESULT
#0    A    X   5   2       0
#1    A    Y   3   2       7
#2    B    X   6   1       0
#3    B    Y   1   3       5
#4    C    Y   1   4       5
#5    D    X   2   3       0
#6    D    Y   1   3       7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM