简体   繁体   中英

Anomalous ordering of column names in pandas

When I export my dataframe from pandas to an excel spreadsheet, my column order appears as below, where '10 Largest Event' is considered to be next after '1 Largest Event', instead of '2 Largest Event'. I want it to appear in numerical order. ie '1 Largest Event', '2 Largest Event', '10 Largest Event'

ID_1    Permit No.        ID_2       1 Largest Event    10 Largest Event    2 Largest Event
10220   To Be Permitted 0010001-24.1       4.0548                  0.822    3.9611

Why is this happening? It's a minor formatting error, but it can be quite the eyesore.

From natsort with reindex

from natsort import natsorted
l=['1 Largest Event','10 Largest Event','2 Largest Event']
natsorted(l)
Out[789]: ['1 Largest Event', '2 Largest Event', '10 Largest Event']
df=df.reindex(columns=natsorted(list(df)))

Problem is your columns are sorted like strings in lexicographical order.

So need sorting with custom function by first splitted value converted to int s:

df = df[sorted(df.columns, key=lambda x: int(x.split()[0]))]

Sample :

cols = ['1 Largest Event', 
        '10 Largest Event', 
        '2 Largest Event',
        '3 Largest Event',
        '4 Largest Event',
        '5 Largest Event', 
        '6 Largest Event', 
        '7 Largest Event', 
        '8 Largest Event', 
        '9 Largest Event']

df = pd.DataFrame(0, columns=cols, index=[0])
print (df)
   1 Largest Event  10 Largest Event  2 Largest Event  3 Largest Event  \
0                0                 0                0                0   

   4 Largest Event  5 Largest Event  6 Largest Event  7 Largest Event  \
0                0                0                0                0   

   8 Largest Event  9 Largest Event  

df = df[sorted(df.columns, key=lambda x: int(x.split()[0]))]
print (df)

   1 Largest Event  2 Largest Event  3 Largest Event  4 Largest Event  \
0                0                0                0                0   

   5 Largest Event  6 Largest Event  7 Largest Event  8 Largest Event  \
0                0                0                0                0   

   9 Largest Event  10 Largest Event  
0                0                 0  

EDIT:

You can also filter last 3 columns for sorting:

df = df[df.columns[:3].tolist() + sorted(df.columns[3:], key=lambda x: int(x.split()[0]))]
print (df)
    ID_1       Permit No.          ID_2  1 Largest Event  2 Largest Event  \
0  10220  To Be Permitted  0010001-24.1           4.0548           3.9611   

   10 Largest Event  
0             0.822  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM