[英]Sort pandas df subset of rows (within a group) by specific column
I have the following dataframe let's say:我有以下 dataframe 假设:
df df
A B C D E
z k s 7 d
z k s 6 l
x t r 2 e
x t r 1 x
u c r 8 f
u c r 9 h
y t s 5 l
y t s 2 o
And I would like to sort it based on col D for each sub row (that has for example same cols A,B and C in this case)我想根据每个子行的列 D 对它进行排序(例如,在这种情况下具有相同的列 A、B 和 C)
The expected output would be:预期的 output 将是:
df df
A B C D E
z k s 6 l
z k s 7 d
x t r 1 x
x t r 2 e
u c r 8 f
u c r 9 h
y t s 2 o
y t s 5 l
Any help for this kind of operation?对这种操作有什么帮助吗?
I think it should be as simple as this:我认为它应该像这样简单:
df = df.sort_values(["A", "B", "C", "D"])
You can use groupby and sort values (also credit to @Henry Ecker for his comment):您可以使用 groupby 和 sort 值(也归功于@Henry Ecker 的评论):
df.groupby(['A','B','C'],group_keys=False,sort=False).apply(pd.DataFrame.sort_values,'D')
output: output:
A B C D E
1 z k s 6 l
0 z k s 7 d
3 x t r 1 x
2 x t r 2 e
4 u c r 8 f
5 u c r 9 h
7 y t s 2 o
6 y t s 5 l
Let us try ngroup
create the help col让我们尝试
ngroup
创建帮助 col
df['new1'] = df.groupby(['A','B','C'],sort=False).ngroup()
df = df.sort_values(['new1','D']).drop('new1',axis=1)
df
A B C D E
1 z k s 6 l
0 z k s 7 d
3 x t r 1 x
2 x t r 2 e
4 u c r 8 f
5 u c r 9 h
7 y t s 2 o
6 y t s 5 l
dic = {
'A': [*'zzxxuuyy'],
'B': [*'kkttcctt'],
'C': [*'ssrrrrss'],
'D': [*map(int, '76218952')],
'E': [*'dlexfhlo']
}
df = pd.DataFrame(dic)
df.groupby(['A', 'B']).apply(lambda df: df.sort_values('D')).droplevel(['A', 'B']).reset_index()
if you want to sort based on columns 'A', 'B', 'C', 'E' then you have to:如果您想根据列“A”、“B”、“C”、“E”进行排序,那么您必须:
df.groupby(['A', 'B', 'D', 'E']).apply(lambda df: df.sort_values('D')).droplevel(['A', 'B', 'D', 'E']).reset_index()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.