[英]How to group , count, and unstack a pandas dataframe based on multiple columns values?
I have the following pandas data frame in which I stored my Win/Loss results of multiple models and multiple companies我有以下 pandas 数据框,其中存储了多个模型和多家公司的赢/输结果
company公司 | Model_1型号_1 | Winloss温洛斯 | Model_2型号_2 | Winloss2输赢2 |
---|---|---|---|---|
Company1公司1 | KNN神经网络 | W W | GPR探地雷达 | L大号 |
Company1公司1 | KNN神经网络 | L大号 | PLS PLS | W W |
Company1公司1 | KNN神经网络 | L大号 | KRR KRR | W W |
Company1公司1 | KNN神经网络 | L大号 | XGB XGB | W W |
Company1公司1 | GPR探地雷达 | L大号 | SGD新元 | W W |
Company2公司2 | GPR探地雷达 | L大号 | PLS PLS | W W |
Company2公司2 | KRR KRR | L大号 | XGB XGB | W W |
I want to group by both company and models and count Win-loss for each model within same company so that I can later unstack the result to and have the output to look like this:我想按公司和模型进行分组,并计算同一公司内每个模型的赢损,以便我以后可以将结果分解为如下所示的输出:
('company', '') ('公司', '') | ('DT', 'L') ('DT','L') | ('DT', 'W') ('DT','W') | ('GPR', 'L') ('GPR','L') | ('KNN', 'L') ('KNN','L') | ('KNN', 'W') ('KNN','W') | ('KRR', 'W') ('KRR','W') | ('PLS', 'W') ('PLS','W') | ('SGD', 'W') ('SGD', 'W') | ('SVR', 'L') ('SVR','L') | ('SVR', 'W') ('SVR','W') |
---|---|---|---|---|---|---|---|---|---|---|
Company1公司1 | 3.0 3.0 | 2.0 2.0 | 5.0 5.0 | 3.0 3.0 | 1.0 1.0 | 1.0 1.0 | 1.0 1.0 | 1.0 1.0 | 2.0 2.0 | 1.0 1.0 |
Company2公司2 | 6.0 6.0 | 2.0 2.0 | 0.0 0.0 | 2.0 2.0 | 1.0 1.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 6.0 6.0 | 1.0 1.0 |
Company3公司3 | 0.0 0.0 | 1.0 1.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 |
Company4公司4 | 6.0 6.0 | 1.0 1.0 | 5.0 5.0 | 0.0 0.0 | 1.0 1.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 1.0 1.0 |
Company5公司5 | 7.0 7.0 | 1.0 1.0 | 5.0 5.0 | 0.0 0.0 | 1.0 1.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 2.0 2.0 |
The table above is a result of my follwoing code but the numbers results of counted values were not accurate :上表是我以下代码的结果,但计数值的数字结果不准确:
WLPerCompany = WinLoss.groupby(['company','Model_1','Winloss'])
['Winloss'].count()
WinLossResults = pd.DataFrame(WLPerCompany)
WinLossResults.columns = [*WinLossResults.columns[:-1], 'counts']
WinLossResults= WinLossResults['counts'].unstack(level=['Model_1',
'Winloss'])
WinLossResults= WinLossResults.fillna(0)
WinLossResults
Use wide_to_long
for reshape first and then crosstab
:先使用wide_to_long
进行整形,然后再使用crosstab
:
df = pd.wide_to_long(df.reset_index().rename(columns={'Winloss':'Winloss1'}),
stubnames=['Model_','Winloss'],
i=['index','company'],
j='tmp').reset_index()
df = pd.crosstab(df['company'], [df['Model_'], df['Winloss']])
print (df)
Model_ GPR KNN KRR PLS SGD XGB
Winloss L L W L W W W W
company
Company1 2 3 1 0 1 1 1 1
Company2 1 0 0 1 0 1 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.