如何根据多列值对 pandas 数据框进行分组、计数和取消堆叠？

Question

I have the following pandas data frame in which I stored my Win/Loss results of multiple models and multiple companies我有以下 pandas 数据框，其中存储了多个模型和多家公司的赢/输结果

company公司	Model_1型号_1	Winloss温洛斯	Model_2型号_2	Winloss2输赢2
Company1公司1	KNN神经网络	W W	GPR探地雷达	L大号
Company1公司1	KNN神经网络	L大号	PLS PLS	W W
Company1公司1	KNN神经网络	L大号	KRR KRR	W W
Company1公司1	KNN神经网络	L大号	XGB XGB	W W
Company1公司1	GPR探地雷达	L大号	SGD新元	W W
Company2公司2	GPR探地雷达	L大号	PLS PLS	W W
Company2公司2	KRR KRR	L大号	XGB XGB	W W

I want to group by both company and models and count Win-loss for each model within same company so that I can later unstack the result to and have the output to look like this:我想按公司和模型进行分组，并计算同一公司内每个模型的赢损，以便我以后可以将结果分解为如下所示的输出：

('company', '') （'公司'， ''）	('DT', 'L') （'DT'，'L'）	('DT', 'W') （'DT'，'W'）	('GPR', 'L') （'GPR'，'L'）	('KNN', 'L') （'KNN'，'L'）	('KNN', 'W') （'KNN'，'W'）	('KRR', 'W') （'KRR'，'W'）	('PLS', 'W') （'PLS'，'W'）	('SGD', 'W') ('SGD', 'W')	('SVR', 'L') （'SVR'，'L'）	('SVR', 'W') （'SVR'，'W'）
Company1公司1	3.0 3.0	2.0 2.0	5.0 5.0	3.0 3.0	1.0 1.0	1.0 1.0	1.0 1.0	1.0 1.0	2.0 2.0	1.0 1.0
Company2公司2	6.0 6.0	2.0 2.0	0.0 0.0	2.0 2.0	1.0 1.0	0.0 0.0	0.0 0.0	0.0 0.0	6.0 6.0	1.0 1.0
Company3公司3	0.0 0.0	1.0 1.0	0.0 0.0	0.0 0.0	0.0 0.0	0.0 0.0	0.0 0.0	0.0 0.0	0.0 0.0	0.0 0.0
Company4公司4	6.0 6.0	1.0 1.0	5.0 5.0	0.0 0.0	1.0 1.0	0.0 0.0	0.0 0.0	0.0 0.0	0.0 0.0	1.0 1.0
Company5公司5	7.0 7.0	1.0 1.0	5.0 5.0	0.0 0.0	1.0 1.0	0.0 0.0	0.0 0.0	0.0 0.0	0.0 0.0	2.0 2.0

The table above is a result of my follwoing code but the numbers results of counted values were not accurate :上表是我以下代码的结果，但计数值的数字结果不准确：

WLPerCompany = WinLoss.groupby(['company','Model_1','Winloss']) 
['Winloss'].count()
WinLossResults = pd.DataFrame(WLPerCompany)
WinLossResults.columns = [*WinLossResults.columns[:-1], 'counts']
WinLossResults= WinLossResults['counts'].unstack(level=['Model_1', 
'Winloss'])
WinLossResults= WinLossResults.fillna(0)
WinLossResults

Answer 1

Use wide_to_long for reshape first and then crosstab :先使用wide_to_long进行整形，然后再使用crosstab ：

df = pd.wide_to_long(df.reset_index().rename(columns={'Winloss':'Winloss1'}), 
                     stubnames=['Model_','Winloss'], 
                     i=['index','company'], 
                     j='tmp').reset_index()

df = pd.crosstab(df['company'], [df['Model_'], df['Winloss']])
print (df)
Model_   GPR KNN    KRR    PLS SGD XGB
Winloss    L   L  W   L  W   W   W   W
company                               
Company1   2   3  1   0  1   1   1   1
Company2   1   0  0   1  0   1   0   1

如何根据多列值对 pandas 数据框进行分组、计数和取消堆叠？

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-05-11 10:26:12

如何根据多列值对 pandas 数据框进行分组、计数和取消堆叠？

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-05-11 10:26:12

解决方案1
2 已采纳 2022-05-11 10:26:12