简体   繁体   English

Python:Pivot 表,按类别计数分组

[英]Python: Pivot table with groupby counting of category

Let's say I have a file that looks like:假设我有一个看起来像这样的文件:

+---------+---------+-------+
| Product | Quality | Origin|
+---------+---------+-------+
| Apple   | Good    |       |
+---------+---------+-------+
| Apple   | Bad     |       |
+---------+---------+-------+
| Apple   | Bad     |       |
+---------+---------+-------+
| Orange  | Good    |       |
+---------+---------+-------+
| .       |         |       |
+---------+---------+-------+
| .       |         |       |
+---------+---------+-------+
| Grape   | Good    |       |
+---------+---------+-------+

I want to make a pivot result with counts:我想用计数制作一个 pivot 结果:

+---------+---------------+------+-----+
| Product | Total Number  | Good | Bad |
+---------+---------------+------+-----+
| Apple   | 5             | 3    | 2   |
+---------+---------------+------+-----+
| Orange  | 8             | 5    | 3   |
+---------+---------------+------+-----+
| Grape   | 3             | 1    | 2   |
+---------+---------------+------+-----+
| Total   | 16            | 9    | 7   |
+---------+---------------+------+-----+

I am using groupby and count to get the total number:我正在使用groupbycount来获取总数:

Total_Product = ProdcutFile.groupby('Product').count()

But I how I can make the result table contain Good and Bad counts?但是我怎样才能使结果表包含好和坏的计数?

Here is one way, using assign and pivot table.这是一种方法,使用分配和 pivot 表。 The assign statement makes a column of ones, and summing this up provides the counts in the final table. assign 语句生成一列,并将其相加提供最终表中的计数。

from io import StringIO
import pandas as pd

data = '''Product  Quality 
Apple    Good    
Apple    Bad     
Apple    Bad     
Orange   Good
Orange   Bad
Grape    Good    
'''

df = (pd.read_csv(StringIO(data), sep='\s+', engine='python')
        .assign(counter = 1)
        .pivot_table(index='Product', 
                     columns='Quality', 
                     values='counter', 
                     aggfunc=sum, 
                     fill_value=0, 
                     margins=True, 
                     margins_name='Totals')
     )
print(df)

Quality  Bad  Good  Totals
Product                   
Apple      2     1       3
Grape      0     1       1
Orange     1     1       2
Totals     3     3       6

(Providing the columns names and ordering is straightforward and not shown.) (提供列名称和排序很简单,未显示。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM