简体   繁体   中英

Sort a data frame in python with duplicates by a string list

I have a data frame with a 250 names with values imported in python via pandas read_csv. It reads in the data:

name val1 val2 val3
George 2.5 1.1 1.0
George 3.1 1.4 0.0
George 1.1 0.9 4.1
Tom 2.1 1.2 -3.0
Tom 3.0 -1.2 3.5
Tom 7.3 5.2 -1.2
Tom 0.1 0.1 0.1
... ... ... ...
Sally 6.1 9.1 -5.6
Sally 5.7 4.7 9.1

I want to reorder these by a particular order:

neworder = ['Sally', ..., 'George', 'Tom']
name val1 val2 val3
Sally 6.1 9.1 -5.6
Sally 5.7 4.7 9.1
... ... ... ...
George 2.5 1.1 1.0
George 3.1 1.4 0.0
George 1.1 0.9 4.1
Tom 2.1 1.2 -3.0
Tom 3.0 -1.2 3.5
Tom 7.3 5.2 -1.2
Tom 0.1 0.1 0.1

In IDL I would do this with some for loops, but I suspect there's a sorting function in Python that my google skills have not been able to find.

Create a lookup dictionary for your sort somehow:

name_order = {'Sally':1, ... , 'George':12, 'Tom':13} # hand-numbered
neworder = ['Sally', ... , 'George', 'Tom']
name_order = {nm:ix for ix,nm in enumerate(neworder)} # generated

And then pass it in a lambda function to the key parameter:

df.sort_values(by='name', key=lambda nm: nm.map(name_order))

I'd need to think a bit about what happened if an unexpected name appeared; you might be able to cope with this by making name_order a collections.defaultdict .

This is the solution

neworder = ['Sally', ... , 'George', 'Tom']
name_order = {nm:ix for ix,nm in enumerate(neworder)} # generated
df.sort_values(by='name', key=lambda nm: nm.map(name_order))

Thanks @Joffan and @ShubhamSharma

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM