[英]Pandas sorting dataframe on the basis of other column value
I have this sorted Pandas dataframe df
. 我有排序的熊猫数据框df
。
I use df = df.sort(col_a,col_b)
我使用df = df.sort(col_a,col_b)
col_a col_b
a 6
a 7
a 8
a 11
b 5
b 10
b 12
c 11
c 13
c 14
But I'd sort df
on the basis of col_b
min and max value to have in the first place the col_a
value with min col_b
value and at the last place the col_a
value with max col_b
value : 不过我倒是排序df
的基础上col_b
最小和最大价值放在首位有col_a
与最小值col_b
值,并在最后的地方col_a
与最大值col_b
值:
col_a col_b
b 5
b 10
b 12
a 6
a 7
a 8
a 11
c 11
c 13
c 14
Is there a fast way to make this kind of sorting using a pandas fucntion? 有没有一种快速的方法可以使用熊猫功能进行这种分类?
EDIT 1: 编辑1:
@Primer solution works for a 2 columns df
. @Primer解决方案适用于2列df
。 With this df
有了这个df
col_a col_b col_c
0 a 6 9
1 a 7 8
2 a 8 7
3 a 11 6
4 b 5 5
5 b 10 4
6 b 12 3
7 c 11 2
8 c 13 1
9 c 14 0
return 返回
ValueError: Wrong number of items passed 2, placement implies 1
EDIT 2 编辑2
d = {'col_a' : ['a','a','a','a','b','b','c','c','c'],
'col_b' :[6,7,8,11,12,13,11,13,14],
'col_c' :[9,8,7,6,5,4,3,2,1]
}
df = DataFrame(d)
return: 返回:
col_a col_b col_c
0 a 6 9
1 a 7 8
2 a 8 7
3 a 11 6
4 b 12 5
5 b 13 4
6 c 11 3
7 c 13 2
8 c 14 1
@Primer Whit this df
your code doesnt'work because it returns: @Primer Whit此df
代码无效,因为它返回:
col_a col_b col_c
0 a 6 9
1 a 7 8
2 a 8 7
3 a 11 6
4 c 11 3
5 c 13 2
6 c 14 1
7 b 12 5
8 b 13 4
I need to have 我需要
col_a col_b col_c
0 a 6 9
1 a 7 8
2 a 8 7
3 a 11 6
4 b 12 5
5 b 13 4
6 c 11 3
7 c 13 2
8 c 14 1
because c
group has the max(value)=14
,instead your code takes max(min)
values 因为c
组的max(value)=14
,所以您的代码采用max(min)
值
You could do this: 您可以这样做:
df['min'] = df.groupby('col_a')['col_b'].transform(lambda x: x.min())
df = df.sort(['min', 'col_a', 'col_b']).reset_index(drop=True).drop('min', 1)
df
Which yields: 产生:
col_a col_b
0 b 5
1 b 10
2 b 12
3 a 6
4 a 7
5 a 8
6 a 11
7 c 11
8 c 13
9 c 14
EDIT: 编辑:
I have fixed the code above to make sure transform
is used on series and not on dataframe (thus avoiding the error). 我已经修复了上面的代码,以确保在序列而不是在数据帧上使用了transform
(从而避免了错误)。
Works for me returning: 对我返回的作品:
col_a col_b col_c
0 b 5 5
1 b 10 4
2 b 12 3
3 a 6 9
4 a 7 8
5 a 8 7
6 a 11 6
7 c 11 2
8 c 13 1
9 c 14 0
I guess you could easily turn this into a function to apply on a dataframe inplace. 我想您可以轻松地将其转换为要就地应用于数据框的函数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.