[英]Including missing combinations of values in a pandas groupby aggregation
Problem 问题
Including all possible values or combinations of values in the output of a pandas groupby aggregation. 通过聚合在pandas group的输出中包括所有可能的值或值组合。
Example 例
Example pandas DataFrame has three columns, User
, Code
, and Subtotal
: 示例pandas DataFrame有三列,
User
, Code
和Subtotal
:
import pandas as pd
example_df = pd.DataFrame([['a', 1, 1], ['a', 2, 1], ['b', 1, 1], ['b', 2, 1], ['c', 1, 1], ['c', 1, 1]], columns=['User', 'Code', 'Subtotal'])
I'd like to group on User
and Code
and get a subtotal for each combination of User
and Code
. 我想对
User
和Code
进行分组,并为每个User
和Code
组合获取一个小计。
print(example_df.groupby(['User', 'Code']).Subtotal.sum().reset_index())
The output I get is: 我得到的输出是:
User Code Subtotal
0 a 1 1
1 a 2 1
2 b 1 1
3 b 2 1
4 c 1 2
How can I include the missing combination User=='c'
and Code==2
in the table, even though it doesn't exist in example_df
? 如何在表中包含缺少的组合
User=='c'
和Code==2
,即使它在example_df
中不存在?
Preferred output 首选输出
Below is the preferred output, with a zero line for the User=='c'
and Code==2
combination. 下面是首选输出,
User=='c'
和Code==2
组合的零线。
User Code Subtotal
0 a 1 1
1 a 2 1
2 b 1 1
3 b 2 1
4 c 1 2
5 c 2 0
You can use unstack
with stack
: 你可以使用
stack
unstack
:
print(example_df.groupby(['User', 'Code']).Subtotal.sum()
.unstack(fill_value=0)
.stack()
.reset_index(name='Subtotal'))
User Code Subtotal
0 a 1 1
1 a 2 1
2 b 1 1
3 b 2 1
4 c 1 2
5 c 2 0
Another solution with reindex
by MultiIndex
created from_product
: 使用
MultiIndex
reindex
创建from_product
另一个解决方案:
df = example_df.groupby(['User', 'Code']).Subtotal.sum()
mux = pd.MultiIndex.from_product(df.index.levels, names=['User','Code'])
print (mux)
MultiIndex(levels=[['a', 'b', 'c'], [1, 2]],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
names=['User', 'Code'])
print (df.reindex(mux, fill_value=0).reset_index(name='Subtotal'))
User Code Subtotal
0 a 1 1
1 a 2 1
2 b 1 1
3 b 2 1
4 c 1 2
5 c 2 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.