groupby + apply 导致一系列同时出现在索引和列中 - 如何防止它？

Question

I've got a following data frame:我有以下数据框：

dict1 = {'id': {0: 11, 1: 12, 2: 13, 3: 14, 4: 15, 5: 16, 6: 19, 7: 18, 8: 17},
 'var1': {0: 20.272108843537413,
  1: 21.088435374149658,
  2: 20.68027210884354,
  3: 23.945578231292515,
  4: 22.857142857142854,
  5: 21.496598639455787,
  6: 39.18367346938776,
  7: 36.46258503401361,
  8: 34.965986394557824},
 'var2': {0: 27.731092436974773,
  1: 43.907563025210074,
  2: 55.67226890756303,
  3: 62.81512605042017,
  4: 71.63865546218487,
  5: 83.40336134453781,
  6: 43.48739495798319,
  7: 59.243697478991606,
  8: 67.22689075630252},
 'var3': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2}}
ex = pd.DataFrame(dict1).set_index('id')

I wanted to sort with within groups according to var1 , so I wrote the following:我想根据var1在组内排序，所以我写了以下内容：

ex.groupby('var3').apply(lambda x: x.sort_values('var1'))

However, it results in a data frame which has var3 both in index and in column.但是，它会导致数据框在索引和列中都具有var3 。 How to prevent that and leave it only in a column?如何防止这种情况并将其仅保留在列中？

Answer 1

You could use:你可以使用：

df_sorted=ex.groupby('var3').apply(lambda x: x.sort_values('var1')).reset_index(drop='var3')
print(df_sorted)

        var1       var2  var3
0  20.272109  27.731092     1
1  20.680272  55.672269     1
2  21.088435  43.907563     1
3  21.496599  83.403361     1
4  22.857143  71.638655     1
5  23.945578  62.815126     1
6  34.965986  67.226891     2
7  36.462585  59.243697     2
8  39.183673  43.487395     2

But you only need DataFrame.sort_values sorting first by var3 and then by var1 :但是您只需要DataFrame.sort_values 先按var3排序，然后按var1排序：

df_sort=ex.sort_values(['var3','var1'])
print(df_sort)

         var1       var2  var3
id                            
11  20.272109  27.731092     1
13  20.680272  55.672269     1
12  21.088435  43.907563     1
16  21.496599  83.403361     1
15  22.857143  71.638655     1
14  23.945578  62.815126     1
17  34.965986  67.226891     2
18  36.462585  59.243697     2
19  39.183673  43.487395     2

Answer 2

Add optional param to groupby as_index=False将可选参数添加到 groupby as_index=False

ex.groupby('var3', as_index=False) \
  .apply(lambda x: x.sort_values('var1'))

Or, if you don't want multiIndex或者，如果您不想要 multiIndex

ex.groupby('var3', as_index=False) \
  .apply(lambda x: x.sort_values('var1')) \
  .reset_index(level=0, drop=True)

groupby + apply 导致一系列同时出现在索引和列中 - 如何防止它？

问题描述

2 个解决方案

解决方案1
1 2019-11-01 20:13:46

解决方案2
1 已采纳 2019-11-01 20:57:59

groupby + apply 导致一系列同时出现在索引和列中 - 如何防止它？

问题描述

2 个解决方案

解决方案1 1 2019-11-01 20:13:46

解决方案2 1 已采纳 2019-11-01 20:57:59

解决方案1
1 2019-11-01 20:13:46

解决方案2
1 已采纳 2019-11-01 20:57:59