在 pandas 中混合聚合和分组

Question

What I have is a data set called 'report' which has details of delivery drivers.我拥有的是一个名为“报告”的数据集，其中包含送货司机的详细信息。 'Pass' means they delivered on time and 'Fail' means they didn't “通过”表示他们按时交付，“失败”表示他们没有按时交付

Name|Outcome
A   |Pass
B   |Fail
C   |Pass
D   |Pass
A   |Fail
C   |Pass

What I want我想要的是

Name|Pass|Fail|Total
A   |1   |1   |2
B   |0   |1   |1
C   |2   |0   |2
D   |1   |0   |1

I tried:我试过了：

report.groupby(['Name','outcome']).agg(['count'])

but it is not giving me the required output但它没有给我所需的 output

Many Thanks非常感谢

Answer 1

Use crosstab with margins=True and margins_name parameter:使用带有margins=True和margins_name参数的crosstab ：

print (pd.crosstab(df['Name'], df['Outcome'], margins=True, margins_name='Total'))
Outcome  Fail  Pass  Total
Name                      
A           1     1      2
B           1     0      1
C           0     2      2
D           0     1      1
Total       2     4      6

And then remove last row with positions by DataFrame.iloc :然后删除DataFrame.iloc位置的最后一行：

df = pd.crosstab(df['Name'], df['Outcome'], margins=True, margins_name='Total').iloc[:-1]
print (df)
Outcome  Fail  Pass  Total
Name                      
A           1     1      2
B           1     0      1
C           0     2      2
D           0     1      1

Answer 2

This is pd.crosstab with sum over axis=1 :这是pd.crosstab ，其中sum over axis=1 ：

df = pd.crosstab(df['Name'], df['Outcome'])
df['Total'] = df[['Fail', 'Pass']].sum(axis=1)

Outcome  Fail  Pass  Total
Name                      
A           1     1      2
B           1     0      1
C           0     2      2
D           0     1      1

Or to remove the column axis name, we use rename_axis :或者要删除列轴名称，我们使用rename_axis ：

df = pd.crosstab(df['Name'], df['Outcome']).reset_index().rename_axis(None, axis='columns')
df['Total'] = df[['Fail', 'Pass']].sum(axis=1)

  Name  Fail  Pass  Total
0    A     1     1      2
1    B     1     0      1
2    C     0     2      2
3    D     0     1      1

Answer 3

In [1]: from io import StringIO

In [2]: df_string = '''Name|Outcome^M
   ...: A   |Pass^M
   ...: B   |Fail^M
   ...: C   |Pass^M
   ...: D   |Pass^M
   ...: A   |Fail^M
   ...: C   |Pass'''


In [3]: report = pd.read_csv(StringIO(df_string), sep='|')

In [4]: report.assign(count=1).groupby(["Name", "Outcome"])["count"].sum().unstack().assign(Total=lambda df: df.sum(axis=1))
Out[4]:
Outcome  Fail  Pass  Total
Name
A         1.0   1.0    2.0
B         1.0   NaN    1.0
C         NaN   2.0    2.0
D         NaN   1.0    1.0

Now you can fill NAs values using the fillna(0) method现在您可以使用fillna(0)方法填充 NAs 值

Answer 4

One way to do it using pandas.dummies and groupby :使用pandas.dummies和groupby的一种方法：

report = pd.get_dummies(df1, columns=['outcome']).groupby(['name'], as_index=False).sum().rename(columns={"outcome_Fail":"Fail", "outcome_Pass":"Pass"})

report["Total"] = report["Pass"] + report["Fail"]

print(report)

Output: Output：

    name Fail Pass Total
0   A     1    1    2
1   B     1    0    1
2   C     0    2    2
3   D     0    1    1

Answer 5

What you want is DataFrame.pivot_table instead of groupby.你想要的是DataFrame.pivot_table而不是 groupby。 Please check the official document, if you find that still unclear to you, you can show me what you've tried and I can help you further.请查看官方文档，如果您仍然不清楚，您可以向我展示您尝试过的内容，我可以进一步帮助您。

在 pandas 中混合聚合和分组

问题描述

4 个解决方案

解决方案1
6 2019-11-19 11:08:52

解决方案2
5 2019-11-19 11:01:34

解决方案3
1 2019-11-19 10:56:34

解决方案4
0 2019-11-19 11:09:32

解决方案5
-1 2019-11-19 10:53:17

在 pandas 中混合聚合和分组

问题描述

4 个解决方案

解决方案1 6 2019-11-19 11:08:52

解决方案2 5 2019-11-19 11:01:34

解决方案3 1 2019-11-19 10:56:34

解决方案4 0 2019-11-19 11:09:32

解决方案5 -1 2019-11-19 10:53:17

解决方案1
6 2019-11-19 11:08:52

解决方案2
5 2019-11-19 11:01:34

解决方案3
1 2019-11-19 10:56:34

解决方案4
0 2019-11-19 11:09:32

解决方案5
-1 2019-11-19 10:53:17