简体   繁体   English

如何根据多列值对 pandas 数据框进行分组、计数和取消堆叠?

[英]How to group , count, and unstack a pandas dataframe based on multiple columns values?

I have the following pandas data frame in which I stored my Win/Loss results of multiple models and multiple companies我有以下 pandas 数据框,其中存储了多个模型和多家公司的赢/输结果

company公司 Model_1型号_1 Winloss温洛斯 Model_2型号_2 Winloss2输赢2
Company1公司1 KNN神经网络 W W GPR探地雷达 L大号
Company1公司1 KNN神经网络 L大号 PLS PLS W W
Company1公司1 KNN神经网络 L大号 KRR KRR W W
Company1公司1 KNN神经网络 L大号 XGB XGB W W
Company1公司1 GPR探地雷达 L大号 SGD新元 W W
Company2公司2 GPR探地雷达 L大号 PLS PLS W W
Company2公司2 KRR KRR L大号 XGB XGB W W

I want to group by both company and models and count Win-loss for each model within same company so that I can later unstack the result to and have the output to look like this:我想按公司和模型进行分组,并计算同一公司内每个模型的赢损,以便我以后可以将结果分解为如下所示的输出:

('company', '') ('公司', '') ('DT', 'L') ('DT','L') ('DT', 'W') ('DT','W') ('GPR', 'L') ('GPR','L') ('KNN', 'L') ('KNN','L') ('KNN', 'W') ('KNN','W') ('KRR', 'W') ('KRR','W') ('PLS', 'W') ('PLS','W') ('SGD', 'W') ('SGD', 'W') ('SVR', 'L') ('SVR','L') ('SVR', 'W') ('SVR','W')
Company1公司1 3.0 3.0 2.0 2.0 5.0 5.0 3.0 3.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 2.0 1.0 1.0
Company2公司2 6.0 6.0 2.0 2.0 0.0 0.0 2.0 2.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 6.0 6.0 1.0 1.0
Company3公司3 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Company4公司4 6.0 6.0 1.0 1.0 5.0 5.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0
Company5公司5 7.0 7.0 1.0 1.0 5.0 5.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 2.0

The table above is a result of my follwoing code but the numbers results of counted values were not accurate :上表是我以下代码的结果,但计数值的数字结果不准确:

WLPerCompany = WinLoss.groupby(['company','Model_1','Winloss']) 
['Winloss'].count()
WinLossResults = pd.DataFrame(WLPerCompany)
WinLossResults.columns = [*WinLossResults.columns[:-1], 'counts']
WinLossResults= WinLossResults['counts'].unstack(level=['Model_1', 
'Winloss'])
WinLossResults= WinLossResults.fillna(0)
WinLossResults

Use wide_to_long for reshape first and then crosstab :先使用wide_to_long进行整形,然后再使用crosstab

df = pd.wide_to_long(df.reset_index().rename(columns={'Winloss':'Winloss1'}), 
                     stubnames=['Model_','Winloss'], 
                     i=['index','company'], 
                     j='tmp').reset_index()

df = pd.crosstab(df['company'], [df['Model_'], df['Winloss']])
print (df)
Model_   GPR KNN    KRR    PLS SGD XGB
Winloss    L   L  W   L  W   W   W   W
company                               
Company1   2   3  1   0  1   1   1   1
Company2   1   0  0   1  0   1   0   1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python pandas:如何基于多列分组和计算唯一值? - Python pandas: How to group by and count unique values based on multiple columns? 根据组(熊猫数据框)计算多列中的唯一值 - Count unique values in multiple columns according by group (pandas dataframe) pandas dataframe 按多列分组并计算不同的值 - pandas dataframe group by multiple columns and count distinct values 熊猫 - 如何对多个变量进行分组和取消堆叠? - Pandas - How to group and unstack on multiple variables? 如何在熊猫中拆开数据框? - How to unstack a dataframe in pandas? "如何根据多个条件估计 Pandas 数据框列值的计数?" - How to estimate count for Pandas dataframe column values based on multiple conditions? 根据多列分组聚合列的唯一值并计算唯一值 - pandas - Aggregate unique values of a column based on group by multiple columns and count unique - pandas 在 Pandas Dataframe 中显示多列,但分组并只计算一列 - Display multiple columns in Pandas Dataframe, but group by and count only one 如何根据两列对熊猫数据框行进行分组以查找每天的计数? - How to group pandas dataframe rows based on two columns to find the count for each day? 如何根据另一个pandas.Series 的索引和值按pandas.Dataframe 的列进行分组? - How to group by a pandas.Dataframe's columns based on the indexes and values of another pandas.Series?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM