[英]Anomalous ordering of column names in pandas
当我将数据框从熊猫导出到Excel电子表格时,我的列顺序如下所示,其中“ 10个最大事件”被认为是“ 1个最大事件”之后的第二个,而不是“ 2个最大事件”。 我希望它以数字顺序显示。 即“ 1个最大事件”,“ 2个最大事件”,“ 10个最大事件”
ID_1 Permit No. ID_2 1 Largest Event 10 Largest Event 2 Largest Event
10220 To Be Permitted 0010001-24.1 4.0548 0.822 3.9611
为什么会这样呢? 这是一个较小的格式化错误,但可能会让人感到非常讨厌。
从natsort
reindex
from natsort import natsorted
l=['1 Largest Event','10 Largest Event','2 Largest Event']
natsorted(l)
Out[789]: ['1 Largest Event', '2 Largest Event', '10 Largest Event']
df=df.reindex(columns=natsorted(list(df)))
问题是您的列按字典顺序像字符串一样排序。
因此需要使用自定义函数进行排序,方法是先将拆分值转换为int
:
df = df[sorted(df.columns, key=lambda x: int(x.split()[0]))]
样品 :
cols = ['1 Largest Event',
'10 Largest Event',
'2 Largest Event',
'3 Largest Event',
'4 Largest Event',
'5 Largest Event',
'6 Largest Event',
'7 Largest Event',
'8 Largest Event',
'9 Largest Event']
df = pd.DataFrame(0, columns=cols, index=[0])
print (df)
1 Largest Event 10 Largest Event 2 Largest Event 3 Largest Event \
0 0 0 0 0
4 Largest Event 5 Largest Event 6 Largest Event 7 Largest Event \
0 0 0 0 0
8 Largest Event 9 Largest Event
df = df[sorted(df.columns, key=lambda x: int(x.split()[0]))]
print (df)
1 Largest Event 2 Largest Event 3 Largest Event 4 Largest Event \
0 0 0 0 0
5 Largest Event 6 Largest Event 7 Largest Event 8 Largest Event \
0 0 0 0 0
9 Largest Event 10 Largest Event
0 0 0
编辑:
您还可以过滤最后3列进行排序:
df = df[df.columns[:3].tolist() + sorted(df.columns[3:], key=lambda x: int(x.split()[0]))]
print (df)
ID_1 Permit No. ID_2 1 Largest Event 2 Largest Event \
0 10220 To Be Permitted 0010001-24.1 4.0548 3.9611
10 Largest Event
0 0.822
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.