简体   繁体   English

如何计算熊猫的条件?

[英]How to calculate with conditions in pandas?

I have a dataframe like this, I want to calculate and add a new column which follows the formula: Value = A(where Time=1) + A(where Time=3) , I don't want to use A (where Time=5).我有一个这样的数据框,我想计算并添加一个遵循以下公式的新列: Value = A(where Time=1) + A(where Time=3) ,我不想使用 A (where Time Value = A(where Time=1) + A(where Time=3) =5)。

Type subType Time   A           Value
 X    a       1      3         =3+9=12
 X    a       3      9  
 X    a       5      9
 X    b       1      4         =4+5=9
 X    b       3      5 
 X    b       5      0
 Y    a       1      1         =1+2=3
 Y    a       3      2  
 Y    a       5      3
 Y    b       1      4         =4+5=9
 Y    b       3      5 
 Y    b       5      2

I know how to do by selecting the cell needed for the formula, but is there any other better ways to perform the calculation?我知道如何通过选择公式所需的单元格来做,但是还有其他更好的方法来执行计算吗? I suspect I need to add a condition but not sure how, any suggestion?我怀疑我需要添加一个条件,但不确定如何添加,有什么建议吗?

Use Series.eq with DataFrame.groupby and Series.cumsum to create groups and add.使用Series.eqDataFrame.groupbySeries.cumsum创建组并添加。

c1 = df.Time.eq(1)
c3 = df.Time.eq(3)
df['Value'] = (df.loc[c1|c3]
                 .groupby(c1.cumsum())
                 .A
                 .transform('sum')
                 .loc[c1])
print(df)

or if you want to identify it based on the non-equivalence with 5 :或者,如果您想根据与 5不等价性来识别它:

c = df['Time'].eq(5)
df['value'] = (df['A'].mask(c)
                     .groupby(c.cumsum())
                     .transform('sum')
                     .where(c.shift(fill_value = True))
              )
 #Another option is map
 c = df['Time'].eq(5)
 c_cumsum = c.cumsum()
 df['value'] = (c_cumsum.map(df['A'].mask(c)
                       .groupby(c_cumsum)
                       .sum())
                       .where(c.shift(fill_value = True)))

Output输出

   Type subType  Time  A  Value
0     X       a     1  3   12.0
1     X       a     3  9    NaN
2     X       a     5  9    NaN
3     X       b     1  4    9.0
4     X       b     3  5    NaN
5     X       b     5  0    NaN
6     Y       a     1  1    3.0
7     Y       a     3  2    NaN
8     Y       a     5  3    NaN
9     Y       b     1  4    9.0
10    Y       b     3  5    NaN
11    Y       b     5  2    NaN

MISSING VALUES缺失值

c = df['Time'].eq(5)
df['value'] = (df['A'].mask(c)
                     .groupby(c.cumsum())
                     .transform('sum')

              )
#or method 1
#c1 = df.Time.eq(1)
#c3 = df.Time.eq(3)
#df['Value'] = (df.loc[c1|c3]
#                 .groupby(c1.cumsum())
#                 .A
#                 .transform('sum')
#               )
print(df)

Output输出

   Type subType  Time  A  value
0     X       a     1  3   12.0
1     X       a     3  9   12.0
2     X       a     5  9    9.0
3     X       b     1  4    9.0
4     X       b     3  5    9.0
5     X       b     5  0    3.0
6     Y       a     1  1    3.0
7     Y       a     3  2    3.0
8     Y       a     5  3    9.0
9     Y       b     1  4    9.0
10    Y       b     3  5    9.0
11    Y       b     5  2    0.0

or filling all except where Time is 5或填写所有除了时间是 5

c = df['Time'].eq(5)
df['value'] = (df['A'].mask(c)
                     .groupby(c.cumsum())
                     .transform('sum').mask(c))

#c1 = df.Time.eq(1)
#c3 = df.Time.eq(3)
#or method 1
#df['Value'] = (df.loc[c1|c3]
#                 .groupby(c1.cumsum())
#                 .A
#                 .transform('sum')
#                 .loc[c1|c3])
print(df)
   Type subType  Time  A  value
0     X       a     1  3   12.0
1     X       a     3  9   12.0
2     X       a     5  9    NaN
3     X       b     1  4    9.0
4     X       b     3  5    9.0
5     X       b     5  0    NaN
6     Y       a     1  1    3.0
7     Y       a     3  2    3.0
8     Y       a     5  3    NaN
9     Y       b     1  4    9.0
10    Y       b     3  5    9.0
11    Y       b     5  2    NaN

Why not use apply here?为什么不在这里使用申请?

Even in a small data frame it is already slower即使在一个小的数据帧中它也已经很慢了

%%timeit

(
    df.groupby(by=['Type','subType'])
    .apply(lambda x: x.loc[x.Time!=5].A.sum()) # sum time each group exclu
    .to_frame('Value').reset_index()
    .pipe(lambda x: pd.merge(df, x, on=['Type', 'subType'], how='left'))
)
13.6 ms ± 2.67 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
c = df['Time'].eq(5)
df['value'] = (df['A'].mask(c)
                     .groupby(c.cumsum())
                     .transform('sum')
                     .where(c.shift(fill_value = True))
              )

3.67 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

You can use groupby to sum A when Time is not 5. Then merge it back with the original df.当时间不是 5 时,您可以使用 groupby 对 A 求和。然后将其与原始 df 合并。

(
    df.groupby(by=['Type','subType'])
    .apply(lambda x: x.loc[x.Time!=5].A.sum()) # sum time each group exclu
    .to_frame('Value').reset_index()
    .pipe(lambda x: pd.merge(df, x, on=['Type', 'subType'], how='left'))
)


    Type    subType Time    A   Value
0   X       a       1       3   12.0
1   X       a       3       9   12.0
2   X       a       5       9   12.0
3   X       b       1       4   9.0
4   X       b       3       5   9.0
5   X       b       5       0   9.0
6   Y       a       1       1   3.0
7   Y       a       3       2   3.0
8   Y       a       5       3   3.0
9   Y       b       1       4   9.0
10  Y       b       3       5   9.0
11  Y       b       5       2   9.0

Answer using only indexing and conditions:仅使用索引和条件回答:

df.loc[df['Time'] == 1,'Value'] = (df[df['Time'] == 1].reset_index()+df[df['Time'] == 3].reset_index())['A'].values
df

   Type subType  Time  A  Value
0     X       a     1  3   12.0
1     X       a     3  9    NaN
2     X       a     5  9    NaN
3     X       b     1  4    9.0
4     X       b     3  5    NaN
5     X       b     5  0    NaN
6     Y       a     1  1    3.0
7     Y       a     3  2    NaN
8     Y       a     5  3    NaN
9     Y       b     1  4    9.0
10    Y       b     3  5    NaN
11    Y       b     5  2    NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM