简体   繁体   English

按特定列对 pandas df 行子集(组内)进行排序

[英]Sort pandas df subset of rows (within a group) by specific column

I have the following dataframe let's say:我有以下 dataframe 假设:

df df


A B C D E
z k s 7 d
z k s 6 l
x t r 2 e
x t r 1 x
u c r 8 f
u c r 9 h
y t s 5 l
y t s 2 o

And I would like to sort it based on col D for each sub row (that has for example same cols A,B and C in this case)我想根据每个子行的列 D 对它进行排序(例如,在这种情况下具有相同的列 A、B 和 C)

The expected output would be:预期的 output 将是:

df df


A B C D E
z k s 6 l
z k s 7 d
x t r 1 x
x t r 2 e
u c r 8 f
u c r 9 h
y t s 2 o
y t s 5 l

Any help for this kind of operation?对这种操作有什么帮助吗?

I think it should be as simple as this:我认为它应该像这样简单:

df = df.sort_values(["A", "B", "C", "D"])

You can use groupby and sort values (also credit to @Henry Ecker for his comment):您可以使用 groupby 和 sort 值(也归功于@Henry Ecker 的评论):

df.groupby(['A','B','C'],group_keys=False,sort=False).apply(pd.DataFrame.sort_values,'D')

output: output:

    A   B   C   D   E
1   z   k   s   6   l
0   z   k   s   7   d
3   x   t   r   1   x
2   x   t   r   2   e
4   u   c   r   8   f
5   u   c   r   9   h
7   y   t   s   2   o
6   y   t   s   5   l

Let us try ngroup create the help col让我们尝试ngroup创建帮助 col

df['new1'] = df.groupby(['A','B','C'],sort=False).ngroup()
df = df.sort_values(['new1','D']).drop('new1',axis=1)
df
   A  B  C  D  E
1  z  k  s  6  l
0  z  k  s  7  d
3  x  t  r  1  x
2  x  t  r  2  e
4  u  c  r  8  f
5  u  c  r  9  h
7  y  t  s  2  o
6  y  t  s  5  l
dic = {
    'A': [*'zzxxuuyy'],
    'B': [*'kkttcctt'],
    'C': [*'ssrrrrss'],
    'D': [*map(int, '76218952')],
    'E': [*'dlexfhlo']
}
df = pd.DataFrame(dic)
df.groupby(['A', 'B']).apply(lambda df: df.sort_values('D')).droplevel(['A', 'B']).reset_index()

if you want to sort based on columns 'A', 'B', 'C', 'E' then you have to:如果您想根据列“A”、“B”、“C”、“E”进行排序,那么您必须:

df.groupby(['A', 'B', 'D', 'E']).apply(lambda df: df.sort_values('D')).droplevel(['A', 'B', 'D', 'E']).reset_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在循环中在 Pandas DF 的特定列中附加行 - Appending rows in a specific column of a Pandas DF in a loop Pandas DF 根据特定列组合行 - Pandas DF combine rows based on specific column 如何将组标签分配给具有特定时间间隔内的日期时间的 pandas df 行? - How to assign group labels to pandas df rows that have a datetime within a specific interval? 在pandas df中对列具有范围内的值的行进行分组 - Group rows where columns have values within range in pandas df 如何在Pandas DF列中对值进行排序并删除重复项 - How to sort values within Pandas DF Column and remove duplicates Python Pandas:排序和分组,然后对第二列的两个连续行求和,得出第三列的特定值 - Python Pandas: Sort and group by, then sum two consecutive rows of 2nd column for a specific value of a 3rd column 根据pandas df中的列对行进行分组(仅填充布尔值) - group rows according to a column in a pandas df (fill with only boolean values) pandas df子列中的字符串列表 - pandas df subset by string in column with lists 计算熊猫DF列子集的均值或方差 - calculate mean or variance for subset of pandas DF column Pandas数据框的子集,其中包含具有特定列值的行 - Subset of a Pandas Dataframe consisting of rows with specific column values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM