[英]Excluding a column after grouping by in pandas
So I was wondering WHY the following is not possible and HOW to get around it. 所以我想知道为什么以下原因是不可能的,以及如何解决它。
I've taken a data frame, grouped by one column, and set it to a new variable. 我已将数据框按一列分组,并将其设置为新变量。 Now I want to do something with that data frame and it produced an error
现在我想对该数据框做些事情,它产生了一个错误
df = pd.DataFrame({'group':list('aaaabbbb'),
'val':[1,3,3,2,5,6,6,2],
'id':[1,1,2,2,2,3,3,3]})
df
newdf = df.groupby("group")
newdf.loc[:, newdf.columns != 'val']
df = pd.DataFrame({'group1':list('aaaabbbb'),
'group2':list('ccccbbbb'),
'val':[1,3,3,2,5,6,6,2],
'id':[1,1,2,2,2,3,3,3]})
df
newdf = df.groupby(["group1","group2"])
newdf.loc[:, newdf.columns != 'val']
AttributeError: Cannot access callable attribute 'loc' of 'DataFrameGroupBy' objects, try using the 'apply' method
I use both of these data frames to create an iqr like the following 我使用这两个数据帧来创建一个iqr,如下所示
Q1 = df1.quantile(0.15)
Q3 = df1.quantile(0.85)
IQR = Q3 - Q1
df1 = pd.DataFrame(IQR).reset_index()
You need to specify an aggregation function with groupby
, for example sum
. 您需要使用
groupby
指定聚合函数,例如sum
。 In addition, it's likely you want the result to be a pd.DataFrame
without setting index to groupby
columns. 另外,可能您希望结果为
pd.DataFrame
而不将索引设置为groupby
列。 This can be achieved by setting as_index=False
. 这可以通过设置
as_index=False
来实现。
Try this: 尝试这个:
import pandas as pd
df = pd.DataFrame({'group1':list('aaaabbbb'),
'group2':list('ccccbbbb'),
'val':[1,3,3,2,5,6,6,2],
'id':[1,1,2,2,2,3,3,3]})
newdf = df.groupby(['group1', 'group2'], as_index=False).sum()
newdf.loc[:, newdf.columns != 'val']
One way to demonstrate this in more detail: 一种更详细地演示此方法:
newdf = df.groupby(['group1', 'group2'])
print(type(newdf)) # <class 'pandas.core.groupby.DataFrameGroupBy'>
print(type(newdf.sum())) # <class 'pandas.core.frame.DataFrame'>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.