I want to sort groups of rows based on a column (in my example, 'Group' is the column to group and then sort the groups (maintain in-group row order). I can't sort by index because the index is purposefully out of order as a result of previous operations.
df = pd.DataFrame({
'Group':[5,5,5,9,9,777,777,1,2,2],
'V1':['a','b','a',3,6,1,None,10,3,None],
'V2':['blah','blah','blah','dog','cat','cat','na','first','last','nada'],
'V3':[1,2,3,4,5,5,4,3,2,1,]
})
And want it to look like this:
I've tried various things like
df.groupby(['Group'])['Group']).aggregate({'min grp':'min'}).sort_values(by=['min grp'], ascending=True)
If it helps, the original df
was created via pd.concat(list-of-dataframes)
and when I sorted them afterwards by Group it also sorted the rows within the Group based on the index, which does not work for my specific problem.
You need to use sort_values
with option kind='mergesort'
. From pandas docs:
kind : {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’
Choice of sorting algorithm. See also ndarray.np.sort for more
information. mergesort is the only stable algorithm. For DataFrames,
this option is only applied when sorting on a single column or label.
A sort algorithm is called stable
when two identical element with equal keys appear in the same order as they are in the input
. List of stable sorts are: insertion sort, merge sort, bubble sort, tim sort, counting sort
So you need:
df = df.sort_values('Group', kind='mergesort')
When you call sort_values
without kind
, it is default 'quicksort' and quicksort
is not stable
If I understand your question correctly, you don't want to group-by, but to sort by the values of your column Group
. You can do it with pandas.sort_values()
df.sort_values('Group', inplace=True)
You can also do it this way.
df.sort_values(by=["Group"])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.