简体   繁体   English

基于其他列值的熊猫排序数据框

[英]Pandas sorting dataframe on the basis of other column value

I have this sorted Pandas dataframe df . 我有排序的熊猫数据框df

I use df = df.sort(col_a,col_b) 我使用df = df.sort(col_a,col_b)

 col_a col_b 
   a     6      
   a     7      
   a     8     
   a     11           
   b     5      
   b     10
   b     12
   c     11      
   c     13      
   c     14     

But I'd sort df on the basis of col_b min and max value to have in the first place the col_a value with min col_b value and at the last place the col_a value with max col_b value : 不过我倒是排序df的基础上col_b最小和最大价值放在首位有col_a与最小值col_b值,并在最后的地方col_a与最大值col_b值:

 col_a col_b      
   b     5      
   b     10
   b     12
   a     6      
   a     7      
   a     8      
   a     11      
   c     11      
   c     13      
   c     14    

Is there a fast way to make this kind of sorting using a pandas fucntion? 有没有一种快速的方法可以使用熊猫功能进行这种分类?

EDIT 1: 编辑1:

@Primer solution works for a 2 columns df . @Primer解决方案适用于2列df With this df 有了这个df

    col_a  col_b  col_c
0     a      6      9
1     a      7      8
2     a      8      7
3     a     11      6
4     b      5      5
5     b     10      4
6     b     12      3
7     c     11      2
8     c     13      1
9     c     14      0

return 返回

ValueError: Wrong number of items passed 2, placement implies 1

EDIT 2 编辑2

d = {'col_a' : ['a','a','a','a','b','b','c','c','c'],
    'col_b' :[6,7,8,11,12,13,11,13,14],
    'col_c' :[9,8,7,6,5,4,3,2,1]
    }

df = DataFrame(d)

return: 返回:

  col_a  col_b  col_c
0     a      6      9
1     a      7      8
2     a      8      7
3     a     11      6
4     b     12      5
5     b     13      4
6     c     11      3
7     c     13      2
8     c     14      1

@Primer Whit this df your code doesnt'work because it returns: @Primer Whit此df代码无效,因为它返回:

  col_a  col_b  col_c
0     a      6      9
1     a      7      8
2     a      8      7
3     a     11      6
4     c     11      3
5     c     13      2
6     c     14      1
7     b     12      5
8     b     13      4

I need to have 我需要

  col_a  col_b  col_c
0     a      6      9
1     a      7      8
2     a      8      7
3     a     11      6
4     b     12      5
5     b     13      4
6     c     11      3
7     c     13      2
8     c     14      1

because c group has the max(value)=14 ,instead your code takes max(min) values 因为c组的max(value)=14 ,所以您的代码采用max(min)

You could do this: 您可以这样做:

df['min'] = df.groupby('col_a')['col_b'].transform(lambda x: x.min())
df = df.sort(['min', 'col_a', 'col_b']).reset_index(drop=True).drop('min', 1)
df

Which yields: 产生:

  col_a  col_b
0     b      5
1     b     10
2     b     12
3     a      6
4     a      7
5     a      8
6     a     11
7     c     11
8     c     13
9     c     14

EDIT: 编辑:

I have fixed the code above to make sure transform is used on series and not on dataframe (thus avoiding the error). 我已经修复了上面的代码,以确保在序列而不是在数据帧上使用了transform (从而避免了错误)。

Works for me returning: 对我返回的作品:

  col_a  col_b  col_c
0     b      5      5
1     b     10      4
2     b     12      3
3     a      6      9
4     a      7      8
5     a      8      7
6     a     11      6
7     c     11      2
8     c     13      1
9     c     14      0

I guess you could easily turn this into a function to apply on a dataframe inplace. 我想您可以轻松地将其转换为要就地应用于数据框的函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM