[英]Create multiple columns with calculations using groupby, aggregate functions in Pandas
import pandas as pd
df = pd.DataFrame({'zip,company': ["46062|A","11236|B","11236|C","11236|C","11236|C","11236|A","11236|A","11236|A","11236|B","11236|B","11236|A","11236|A","11236|B","11236|A","11236|A","11236|B","11236|A","11236|A"],
'goodbadscore': ["good","bad","bad","good","good","bad","bad","good","good","good","bad","good","good","good","good","bad","bad","good"],
'postlcode' : ["46062","11236","11236","11236","11236","46062","11236","46062","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236"],
'companyname': ["A","B","C","C","C","A","A","A","B","B","A","A","B","A","A","B","A","A"]}
)
print(df)
-----更新了上面的示例数据框作为建议-----
我试图在 Excel 中生成结果,但是使用 countif 和 countifs 破坏了我的桌面,即使它很好,完成任务也需要几分钟。 希望能得到一些帮助和指导。
这是我试图实现的目标:
我想根据收集的数据在几个邮政编码中对公司的声誉进行评分。 生产所需的列:
我能够生产1 :
op = df.groupby(['zip+company'])['zip+company'].count()
在2上有困难:想保持输出为 1,但应用后变为 0。 只想显示对第 2 列的好处
op = op.groupby(['zip+company'])[['zip+company','countgoodscoreunderzip']].apply(lambda x: x[x=='good'].count())
那么3 ,我想这是选择2并除以1 的问题
4不知道如何在pandas中排名,这可能是一个简单的排名
excel 的图片是理想的输出(使用示例数据框更新)。
感谢阅读。
命名聚合应该有助于前两列:
op = df.groupby('zip,company', as_index=False).aggregate(
countinzipcode=('zip,company', 'count'),
goodscoreinzip=('goodbadscore', lambda s: s.eq('good').sum())
)
op
:
zip,company countinzipcode goodscoreinzip
0 11236|A 7 4
1 11236|B 5 3
2 11236|C 3 2
3 46062|A 3 2
可以使用简单的数学运算来获得 3 的百分比:
op['goodscore%'] = op['goodscoreinzip'] / op['countinzipcode'] * 100
zip,company countinzipcode goodscoreinzip goodscore%
0 11236|A 7 4 57.142857
1 11236|B 5 3 60.000000
2 11236|C 3 2 66.666667
3 46062|A 3 2 66.666667
然后rank
可用于获得 4 的排名:
op['ranking'] = op['goodscore%'].rank(ascending=False, method='dense').astype(int)
op
:
zip,company countinzipcode goodscoreinzip goodscore% ranking
0 11236|A 7 4 57.142857 3
1 11236|B 5 3 60.000000 2
2 11236|C 3 2 66.666667 1
3 46062|A 3 2 66.666667 1
使用的示例数据(基于图像中的数字而不是代码构造函数):
df = pd.DataFrame({
'zip,company': ["46062|A", "11236|B", "11236|C", "11236|C",
"11236|C", "11236|A", "11236|A", "11236|A",
"11236|B", "11236|B", "11236|A", "11236|A",
"11236|B", "11236|A", "11236|A", "11236|B",
"46062|A", "46062|A"],
'goodbadscore': ["good", "bad", "bad", "good", "good", "bad",
"bad", "good", "good", "good", "bad",
"good", "good", "good", "good", "bad",
"bad", "good"],
'postlcode': ["46062", "11236", "11236", "11236", "11236",
"46062", "11236", "46062", "11236", "11236",
"11236", "11236", "11236", "11236", "11236",
"11236", "11236", "11236"],
'companyname': ["A", "B", "C", "C", "C", "A", "A", "A", "B",
"B", "A", "A", "B", "A", "A", "B", "A", "A"]
})
zip,company goodbadscore postlcode companyname
0 46062|A good 46062 A
1 11236|B bad 11236 B
2 11236|C bad 11236 C
3 11236|C good 11236 C
4 11236|C good 11236 C
5 11236|A bad 46062 A
6 11236|A bad 11236 A
7 11236|A good 46062 A
8 11236|B good 11236 B
9 11236|B good 11236 B
10 11236|A bad 11236 A
11 11236|A good 11236 A
12 11236|B good 11236 B
13 11236|A good 11236 A
14 11236|A good 11236 A
15 11236|B bad 11236 B
16 46062|A bad 11236 A
17 46062|A good 11236 A
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.