繁体   English   中英

使用 groupby 和 Pandas 中的聚合函数创建多列计算

[英]Create multiple columns with calculations using groupby, aggregate functions in Pandas

import pandas as pd
df = pd.DataFrame({'zip,company': ["46062|A","11236|B","11236|C","11236|C","11236|C","11236|A","11236|A","11236|A","11236|B","11236|B","11236|A","11236|A","11236|B","11236|A","11236|A","11236|B","11236|A","11236|A"], 
                   'goodbadscore': ["good","bad","bad","good","good","bad","bad","good","good","good","bad","good","good","good","good","bad","bad","good"],
                   'postlcode' : ["46062","11236","11236","11236","11236","46062","11236","46062","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236"],
                   'companyname': ["A","B","C","C","C","A","A","A","B","B","A","A","B","A","A","B","A","A"]}
                   )
                   
print(df)

-----更新了上面的示例数据框作为建议-----

我试图在 Excel 中生成结果,但是使用 countif 和 countifs 破坏了我的桌面,即使它很好,完成任务也需要几分钟。 希望能得到一些帮助和指导。

这是我试图实现的目标:

我想根据收集的数据在几个邮政编码中对公司的声誉进行评分。 生产所需的列:

  1. 计数邮编
  2. countgoodscore 压缩包
  3. dividegoodscore%(2/1)
  4. 排行
  • 我能够生产1

    op = df.groupby(['zip+company'])['zip+company'].count()

  • 2上有困难:想保持输出为 1,但应用后变为 0。 只想显示对第 2 列的好处

    op = op.groupby(['zip+company'])[['zip+company','countgoodscoreunderzip']].apply(lambda x: x[x=='good'].count())

  • 那么3 ,我想这是选择2并除以1 的问题

  • 4不知道如何在pandas中排名,这可能是一个简单的排名

excel 的图片是理想的输出(使用示例数据框更新)。

感谢阅读。

在此处输入图片说明

命名聚合应该有助于前两列:

op = df.groupby('zip,company', as_index=False).aggregate(
    countinzipcode=('zip,company', 'count'),
    goodscoreinzip=('goodbadscore', lambda s: s.eq('good').sum())
)

op

  zip,company  countinzipcode  goodscoreinzip
0     11236|A               7               4
1     11236|B               5               3
2     11236|C               3               2
3     46062|A               3               2

可以使用简单的数学运算来获得 3 的百分比:

op['goodscore%'] = op['goodscoreinzip'] / op['countinzipcode'] * 100
  zip,company  countinzipcode  goodscoreinzip  goodscore%
0     11236|A               7               4   57.142857
1     11236|B               5               3   60.000000
2     11236|C               3               2   66.666667
3     46062|A               3               2   66.666667

然后rank可用于获得 4 的排名:

op['ranking'] = op['goodscore%'].rank(ascending=False, method='dense').astype(int)

op

  zip,company  countinzipcode  goodscoreinzip  goodscore%  ranking
0     11236|A               7               4   57.142857        3
1     11236|B               5               3   60.000000        2
2     11236|C               3               2   66.666667        1
3     46062|A               3               2   66.666667        1

使用的示例数据(基于图像中的数字而不是代码构造函数):

df = pd.DataFrame({
    'zip,company': ["46062|A", "11236|B", "11236|C", "11236|C",
                    "11236|C", "11236|A", "11236|A", "11236|A",
                    "11236|B", "11236|B", "11236|A", "11236|A",
                    "11236|B", "11236|A", "11236|A", "11236|B",
                    "46062|A", "46062|A"],
    'goodbadscore': ["good", "bad", "bad", "good", "good", "bad",
                     "bad", "good", "good", "good", "bad",
                     "good", "good", "good", "good", "bad",
                     "bad", "good"],
    'postlcode': ["46062", "11236", "11236", "11236", "11236",
                  "46062", "11236", "46062", "11236", "11236",
                  "11236", "11236", "11236", "11236", "11236",
                  "11236", "11236", "11236"],
    'companyname': ["A", "B", "C", "C", "C", "A", "A", "A", "B",
                    "B", "A", "A", "B", "A", "A", "B", "A", "A"]
})
   zip,company goodbadscore postlcode companyname
0      46062|A         good     46062           A
1      11236|B          bad     11236           B
2      11236|C          bad     11236           C
3      11236|C         good     11236           C
4      11236|C         good     11236           C
5      11236|A          bad     46062           A
6      11236|A          bad     11236           A
7      11236|A         good     46062           A
8      11236|B         good     11236           B
9      11236|B         good     11236           B
10     11236|A          bad     11236           A
11     11236|A         good     11236           A
12     11236|B         good     11236           B
13     11236|A         good     11236           A
14     11236|A         good     11236           A
15     11236|B          bad     11236           B
16     46062|A          bad     11236           A
17     46062|A         good     11236           A

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM