[英]Selecting rows based on sum over multiindex in Pandas
import pandas as pd
import numpy as np
np.random.seed(365)
rows = 100
data = {'Month': np.random.choice(['2014-01', '2014-02', '2014-03', '2014-04'], size=rows),
'Code': np.random.choice(['A', 'B', 'C'], size=rows),
'ColA': np.random.randint(5, 125, size=rows),
'ColB': np.random.randint(0, 51, size=rows),}
df = pd.DataFrame(data)
df = df[((~((df.Code=='A')&(df.Month=='2014-04')))&(~((df.Code=='C')&(df.Month=='2014-03'))))]
dfg = df.groupby(['Code', 'Month']).sum()
Above gives my dataframe. I want to select only those entries which have sum (of ColA) over 1000 when this sum is performed over level[0]上面给出了我的 dataframe。我只想 select 那些总和(ColA)超过 1000 的条目,当这个总和在级别 [0] 上执行时
dfg.ColA.sum(level=[0])
dfg[dfg.ColA.sum(level=[0])>1000]
Above one throw an error?以上一个抛出错误? Expected output is:
预计 output 是:
ColA ColB
Code Month
B 2014-01 477 300
2014-02 591 167
2014-03 522 192
2014-04 367 169
C 2014-01 412 180
2014-02 275 205
2014-04 901 309
You need to use groupby
+ transform
to broadcast the sum values across level=0
index您需要使用
groupby
+ transform
在level=0
索引中广播总和值
dfg[dfg.groupby(level=0)['ColA'].transform('sum').gt(1000)]
ColA ColB
Code Month
B 2014-01 477 300
2014-02 591 167
2014-03 522 192
2014-04 367 169
C 2014-01 412 180
2014-02 275 205
2014-04 901 309
another way to do the same:另一种方法做同样的事情:
groups = [g for _,g in df.groupby('Code') if g.ColA.sum()>1000]
pd.concat(groups).groupby(['Code', 'Month']).sum()
'''
ColA ColB
Code Month
B 2014-01 477 300
2014-02 591 167
2014-03 522 192
2014-04 367 169
C 2014-01 412 180
2014-02 275 205
2014-04 901 309
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.