[英]Filtering a Pandas pivot table *during* the pivot
Let's assume we have the following data frame df
:假设我们有以下数据框
df
:
df = pd.DataFrame({'food' : ['spam', 'ham', 'eggs', 'ham', 'ham', 'eggs', 'milk'],
'sales' : [10, 15, 12, 5, 14, 3, 8]})
I'd like to pivot this data to show the sum of sales
by food
, but only if sales
is greater than 12 .我想 pivot 这个数据来显示
food
的sales
总和,但前提是sales
大于 12 。 The resulting pivot table would look as follows:生成的 pivot 表如下所示:
Unfiltered df:未过滤的 df:
food sum(sales)
spam 10
ham 34
eggs 15
milk 8
Filtered df:过滤 df:
food sum(sales)
ham 34
eggs 15
I can use groupby()
as follows:我可以使用
groupby()
如下:
df_new.groupby(['food'])['sales'].agg('sum') > 12
But, this only gives me the boolean and not the filtered df
.但是,这只给我 boolean 而不是过滤后的
df
。
Is this possible to filter a column "on the fly" when using the pd.pivot_table()
function?使用
pd.pivot_table()
function 时是否可以“即时”过滤列? (ie without pre-filtering the df
) (即没有预过滤
df
)
You can pass a lambda function .loc
which will filter the dataframe for only rows that match the condition that the lambda function returns:您可以传递 lambda function
.loc
,它将过滤 dataframe 以仅匹配 lambda function 返回的条件的行:
filtered = df.groupby('food')['sales'].sum().reset_index().loc[lambda x: x['sales'] > 12]
Output: Output:
>>> filtered
food sales
0 eggs 15
1 ham 34
(In case you're wondering, the lambda function gets executed for the whole dataframe, not for each individual row, so yes, it's very efficient:) (如果您想知道,lambda function 会针对整个 dataframe 执行,而不是针对每一行执行,所以是的,它非常有效:)
groupby
produces a series object. It's not pretty, but you can subset it dynamically using: groupby
产生了一个系列 object。它并不漂亮,但您可以使用以下方法对其进行动态子集化:
df.groupby(['food'])['sales'].agg('sum')[df.groupby(['food'])['sales'].agg('sum')>12]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.