I am trying to use itemgetter to do a double sort in python, but I can't seem to grasp it. What I want to do is first sort by 'var2' and then by 'var4', but keeping it sorted by 'var2'. I have the following piece of code that should help (I adapted it from another SO question):
df = df[['var1', 'var2', 'var3', 'var4']]
df = sorted(df, key=operator.itemgetter(1,2))
but I am not sure what the arguments mean for itemgetter. I tried running it the way it is, but all I get are the variables' names.
I also tried doing
df = sorted(df, key=operator.itemgetter(2,4))
but I get the following error: 'IndexError: string index out of range'.
Please help.
Edit: example
I have four variables: date, time, price and a number. I want to sort the dataframe by date, but within each date I want to sort it by the number. I hope this makes sense.
date time price number
09/02/2008 00:20:38 46.0 9987
09/03/2009 07:00:49 46.65 8551
07/05/2008 07:00:51 46.75 13681
08/02/2008 07:00:57 46.75 14022
09/02/2008 07:01:00 46.75 10270
09/08/2008 07:01:11 46.75 14850
09/02/2008 07:01:22 46.75 20568
08/02/2008 07:01:24 46.75 15683
09/02/2008 07:02:16 46.65 11698
operator.itemgetter(a, b, c)
is equivalent to lambda x: x[a][b][c]
, not to lambda x: (x[a], x[b], x[c])
.
What you really want is:
sorted(df, key=lambda x: (x[1], x[3]))
Also note that I changed the indices 2 and 4 to 1 and 3; Python indexes start with 0.
Since you appear to be using pandas
DataFrames, not lists (next time, mention that in your question), here's how you sort a DataFrame by value:
df.sort_values(['time', 'number'])
Call df.sort_values
with a column or a list of columns to sort by. Don't assign this to anything; df.sort_values
works in-place.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.