简体   繁体   English

groupby + apply 导致一系列同时出现在索引和列中 - 如何防止它?

[英]groupby + apply results in a series appearing both in index and column - how to prevent it?

I've got a following data frame:我有以下数据框:

dict1 = {'id': {0: 11, 1: 12, 2: 13, 3: 14, 4: 15, 5: 16, 6: 19, 7: 18, 8: 17},
 'var1': {0: 20.272108843537413,
  1: 21.088435374149658,
  2: 20.68027210884354,
  3: 23.945578231292515,
  4: 22.857142857142854,
  5: 21.496598639455787,
  6: 39.18367346938776,
  7: 36.46258503401361,
  8: 34.965986394557824},
 'var2': {0: 27.731092436974773,
  1: 43.907563025210074,
  2: 55.67226890756303,
  3: 62.81512605042017,
  4: 71.63865546218487,
  5: 83.40336134453781,
  6: 43.48739495798319,
  7: 59.243697478991606,
  8: 67.22689075630252},
 'var3': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2}}
ex = pd.DataFrame(dict1).set_index('id')

I wanted to sort with within groups according to var1 , so I wrote the following:我想根据var1在组内排序,所以我写了以下内容:

ex.groupby('var3').apply(lambda x: x.sort_values('var1'))

However, it results in a data frame which has var3 both in index and in column.但是,它会导致数据框在索引和列中都具有var3 How to prevent that and leave it only in a column?如何防止这种情况并将其仅保留在列中?

You could use:你可以使用:

df_sorted=ex.groupby('var3').apply(lambda x: x.sort_values('var1')).reset_index(drop='var3')
print(df_sorted)

        var1       var2  var3
0  20.272109  27.731092     1
1  20.680272  55.672269     1
2  21.088435  43.907563     1
3  21.496599  83.403361     1
4  22.857143  71.638655     1
5  23.945578  62.815126     1
6  34.965986  67.226891     2
7  36.462585  59.243697     2
8  39.183673  43.487395     2

But you only need DataFrame.sort_values sorting first by var3 and then by var1 :但是您只需要DataFrame.sort_values 先按var3排序,然后按var1排序:

df_sort=ex.sort_values(['var3','var1'])
print(df_sort)

         var1       var2  var3
id                            
11  20.272109  27.731092     1
13  20.680272  55.672269     1
12  21.088435  43.907563     1
16  21.496599  83.403361     1
15  22.857143  71.638655     1
14  23.945578  62.815126     1
17  34.965986  67.226891     2
18  36.462585  59.243697     2
19  39.183673  43.487395     2

Add optional param to groupby as_index=False将可选参数添加到 groupby as_index=False

ex.groupby('var3', as_index=False) \
  .apply(lambda x: x.sort_values('var1'))

Or, if you don't want multiIndex或者,如果您不想要 multiIndex

ex.groupby('var3', as_index=False) \
  .apply(lambda x: x.sort_values('var1')) \
  .reset_index(level=0, drop=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM