Python Pandas：替换groupby操作

Question

I have the following table as a pandas dataframe : 我有下表作为pandas dataframe ：

| ID | Name | Sales | Source   |
|----|------|-------|----------|
| 1  | a    | 34    | Source A |
| 2  | b    | 3423  | Source A |
| 3  | c    | 2     | Source A |
| 4  | d    | 342   | Source A |
| 3  | c    | 34    | Source A |
| 5  | e    | 234   | Source A |
| 6  | f    | 234   | Source A |
| 7  | g    | 23    | Source A |
| 1  | a    | 12    | Source B |
| 2  | b    | 42    | Source B |
| 3  | c    | 9     | Source B |
| 2  | b    | 22    | Source B |
| 1  | a    | 1     | Source B |
| 8  | h    | 56    | Source B |

What is the best way to (i) aggregate sales for each ID for each soure and (ii) put the result in two new columns "Source A" and "Source B" such that the resulting dataframe looks as follows: 最佳方法是（i）汇总每个标识的每个ID的销售额，以及（ii）将结果放入两个新列“源A”和“源B”，以使结果dataframe如下所示：

| ID | Name | Source A | Source B |
|----|------|----------|----------|
| 1  | a    | 34       | 13       |
| 2  | b    | 3423     | 64       |
| 3  | c    | 36       | 9        |
| 4  | d    | 342      | 0        |
| 5  | e    | 234      | 0        |
| 6  | f    | 234      | 0        |
| 7  | g    | 23       | 0        |
| 8  | h    | 0        | 56       |

My initial approach was as follows: 我最初的方法如下：

data = {"ID":[1,2,3,4,3,5,6,7,1,2,3,2,1,8], 
      "Name":list("abcdcefgabcbah"), 
      "Sales":[34,3423,2,342,34,234,234,23,12,42,9,22,1,56],
      "Source":["Source A"]*8 + ["Source B"]*6
     }
df = pd.DataFrame(data)

df.groupby(["ID","Name","Source"])["Sales"].sum().unstack()

Question : my initial table is build using different files and than applying pd.concat . 问题：我的初始表是使用不同的文件构建的，而不是应用pd.concat 。 So it feels like I could achieve the final table by concatenating (or merging) differently in the first place. 因此，感觉我可以通过首先以不同的方式串联（或合并）来获得最终表。 Is there a better approach to achieve this? 是否有更好的方法来实现这一目标？ As a side node: the actual data table consists out of 6 different sources. 作为副节点：实际数据表由6个不同的来源组成。

Thanks for your help! 谢谢你的帮助！

Answer 1

You can use `pd.crosstab` : 您可以使用`pd.crosstab` ：

pd.crosstab(df.Name, df.Source, df.Sales, aggfunc='sum').fillna(0)

Output: 输出：

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Or, pivot_table 或者，pivot_table

df.pivot_table('Sales','Name','Source', aggfunc='sum').fillna(0)

Output: 输出：

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Or using `set_index` and `sum` with `level` parameter then `unstack` : 或者使用`set_index`并使用`level`参数`sum` ，然后`unstack` ：

df.set_index(['Name','Source'])['Sales'].sum(level=[0,1]).unstack(fill_value=0)

Output: 输出：

Source  Source A  Source B
Name                      
a             34        13
b           3423        64
c             36         9
d            342         0
e            234         0
f            234         0
g             23         0
h              0        56

Answer 2

Try the following code: 尝试以下代码：

df.groupby(['Name', 'Source'])['Sales'].sum()\
    .unstack(1).fillna(0).reset_index()

Python Pandas：替换groupby操作

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-01-30 19:16:02

You can use `pd.crosstab` : 您可以使用`pd.crosstab` ：

Or, pivot_table 或者，pivot_table

Or using `set_index` and `sum` with `level` parameter then `unstack` : 或者使用`set_index`并使用`level`参数`sum` ，然后`unstack` ：

解决方案2
1 2019-01-30 18:46:49

Python Pandas：替换groupby操作

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-01-30 19:16:02

You can use pd.crosstab : 您可以使用pd.crosstab ：

Or, pivot_table 或者，pivot_table

Or using set_index and sum with level parameter then unstack : 或者使用set_index并使用level参数sum ，然后unstack ：

解决方案2 1 2019-01-30 18:46:49

解决方案1
3 已采纳 2019-01-30 19:16:02

You can use `pd.crosstab` : 您可以使用`pd.crosstab` ：

Or using `set_index` and `sum` with `level` parameter then `unstack` : 或者使用`set_index`并使用`level`参数`sum` ，然后`unstack` ：

解决方案2
1 2019-01-30 18:46:49