简体   繁体   English

Groupby Pandas会生成多个条件字段

[英]Groupby Pandas generate multiple fields with condition

I have a pandas dataframe as such: 我有这样的熊猫数据框:

df = pandas.DataFrame( { 
    "Label" : ["A", "A", "B", "B", "C" , "C"] , 
    "Value" : [1, 9, 1, 1, 9, 9],
    "Weight" : [2, 4, 6, 8, 10, 12} )

I would like to group the data by 'Label' and generate 2 fields. 我想按“标签”对数据进行分组并生成2个字段。

  • The First field, 'newweight' would sum Weight if Value==1 如果Value == 1,则第一个字段“ newweight”将对“权重”求和
  • The Second field, 'weightvalue' would sum Weight*Value 第二个字段“ weightvalue”将求和Weight * Value

So I would be left with the following dataframe: 因此,我将得到以下数据框:

Label     newweight     weightvalue
 A           2               38
 B           14              14
 C           0               198

I have looked into the pandas groupby() function but have had trouble generating the 2 fields with it. 我已经研究了pandas groupby()函数,但是在生成2个字段时遇到了麻烦。

Use groupby.apply , you can do: 使用groupby.apply ,您可以执行以下操作:

df.groupby('Label').apply(
  lambda g: pd.Series({
    "newweight": g.Weight[g.Value == 1].sum(),
    "weightvalue": g.Weight.mul(g.Value).sum()
})).fillna(0)

#       newweight  weightvalue
#Label
#A            2.0         38.0
#B           14.0         14.0
#C            0.0        198.0
pd.DataFrame({'Label':df.Label.unique(),'newweight':df.groupby('Label').apply(lambda x : sum((x.Value==1)*x.Weight)).values,'weightvalue':df.groupby('Label').apply(lambda x : sum(x.Value*x.Weight)).values})
Out[113]: 
  Label  newweight  weightvalue
0     A          2           38
1     B         14           14
2     C          0          198

Fast 快速
Super complicated but very cool approach using Numpy's bincount . 使用Numpy的bincount超级复杂但非常酷的方法。 And likely very fast. 而且可能很快。

v = df.Value.values
w = df.Weight.values
p = v * w
f, u = pd.factorize(df.Label.values)

pd.DataFrame(dict(
    newweight=np.bincount(f, p).astype(int),
    weightvalue=np.bincount(f, p * (v == 1)).astype(int)
), pd.Index(u, name='Label'))

       newweight  weightvalue
Label                        
A             38            2
B             14           14
C            198            0

Creative 有创意
Using pd.DataFrame.eval 使用pd.DataFrame.eval

e = """
newweight = Value * Weight
weightvalue = newweight * (Value == 1)
"""
df.set_index('Label').eval(e).iloc[:, -2:].sum(level=0)

       newweight  weightvalue
Label                        
A             38            2
B             14           14
C            198            0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM