简体   繁体   English

列名称在熊猫中的异常排序

[英]Anomalous ordering of column names in pandas

When I export my dataframe from pandas to an excel spreadsheet, my column order appears as below, where '10 Largest Event' is considered to be next after '1 Largest Event', instead of '2 Largest Event'. 当我将数据框从熊猫导出到Excel电子表格时,我的列顺序如下所示,其中“ 10个最大事件”被认为是“ 1个最大事件”之后的第二个,而不是“ 2个最大事件”。 I want it to appear in numerical order. 我希望它以数字顺序显示。 ie '1 Largest Event', '2 Largest Event', '10 Largest Event' 即“ 1个最大事件”,“ 2个最大事件”,“ 10个最大事件”

ID_1    Permit No.        ID_2       1 Largest Event    10 Largest Event    2 Largest Event
10220   To Be Permitted 0010001-24.1       4.0548                  0.822    3.9611

Why is this happening? 为什么会这样呢? It's a minor formatting error, but it can be quite the eyesore. 这是一个较小的格式化错误,但可能会让人感到非常讨厌。

From natsort with reindex natsort reindex

from natsort import natsorted
l=['1 Largest Event','10 Largest Event','2 Largest Event']
natsorted(l)
Out[789]: ['1 Largest Event', '2 Largest Event', '10 Largest Event']
df=df.reindex(columns=natsorted(list(df)))

Problem is your columns are sorted like strings in lexicographical order. 问题是您的列按字典顺序像字符串一样排序。

So need sorting with custom function by first splitted value converted to int s: 因此需要使用自定义函数进行排序,方法是先将拆分值转换为int

df = df[sorted(df.columns, key=lambda x: int(x.split()[0]))]

Sample : 样品

cols = ['1 Largest Event', 
        '10 Largest Event', 
        '2 Largest Event',
        '3 Largest Event',
        '4 Largest Event',
        '5 Largest Event', 
        '6 Largest Event', 
        '7 Largest Event', 
        '8 Largest Event', 
        '9 Largest Event']

df = pd.DataFrame(0, columns=cols, index=[0])
print (df)
   1 Largest Event  10 Largest Event  2 Largest Event  3 Largest Event  \
0                0                 0                0                0   

   4 Largest Event  5 Largest Event  6 Largest Event  7 Largest Event  \
0                0                0                0                0   

   8 Largest Event  9 Largest Event  

df = df[sorted(df.columns, key=lambda x: int(x.split()[0]))]
print (df)

   1 Largest Event  2 Largest Event  3 Largest Event  4 Largest Event  \
0                0                0                0                0   

   5 Largest Event  6 Largest Event  7 Largest Event  8 Largest Event  \
0                0                0                0                0   

   9 Largest Event  10 Largest Event  
0                0                 0  

EDIT: 编辑:

You can also filter last 3 columns for sorting: 您还可以过滤最后3列进行排序:

df = df[df.columns[:3].tolist() + sorted(df.columns[3:], key=lambda x: int(x.split()[0]))]
print (df)
    ID_1       Permit No.          ID_2  1 Largest Event  2 Largest Event  \
0  10220  To Be Permitted  0010001-24.1           4.0548           3.9611   

   10 Largest Event  
0             0.822  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM