[英]Anomalous ordering of column names in pandas
當我將數據框從熊貓導出到Excel電子表格時,我的列順序如下所示,其中“ 10個最大事件”被認為是“ 1個最大事件”之后的第二個,而不是“ 2個最大事件”。 我希望它以數字順序顯示。 即“ 1個最大事件”,“ 2個最大事件”,“ 10個最大事件”
ID_1 Permit No. ID_2 1 Largest Event 10 Largest Event 2 Largest Event
10220 To Be Permitted 0010001-24.1 4.0548 0.822 3.9611
為什么會這樣呢? 這是一個較小的格式化錯誤,但可能會讓人感到非常討厭。
從natsort
reindex
from natsort import natsorted
l=['1 Largest Event','10 Largest Event','2 Largest Event']
natsorted(l)
Out[789]: ['1 Largest Event', '2 Largest Event', '10 Largest Event']
df=df.reindex(columns=natsorted(list(df)))
問題是您的列按字典順序像字符串一樣排序。
因此需要使用自定義函數進行排序,方法是先將拆分值轉換為int
:
df = df[sorted(df.columns, key=lambda x: int(x.split()[0]))]
樣品 :
cols = ['1 Largest Event',
'10 Largest Event',
'2 Largest Event',
'3 Largest Event',
'4 Largest Event',
'5 Largest Event',
'6 Largest Event',
'7 Largest Event',
'8 Largest Event',
'9 Largest Event']
df = pd.DataFrame(0, columns=cols, index=[0])
print (df)
1 Largest Event 10 Largest Event 2 Largest Event 3 Largest Event \
0 0 0 0 0
4 Largest Event 5 Largest Event 6 Largest Event 7 Largest Event \
0 0 0 0 0
8 Largest Event 9 Largest Event
df = df[sorted(df.columns, key=lambda x: int(x.split()[0]))]
print (df)
1 Largest Event 2 Largest Event 3 Largest Event 4 Largest Event \
0 0 0 0 0
5 Largest Event 6 Largest Event 7 Largest Event 8 Largest Event \
0 0 0 0 0
9 Largest Event 10 Largest Event
0 0 0
編輯:
您還可以過濾最后3列進行排序:
df = df[df.columns[:3].tolist() + sorted(df.columns[3:], key=lambda x: int(x.split()[0]))]
print (df)
ID_1 Permit No. ID_2 1 Largest Event 2 Largest Event \
0 10220 To Be Permitted 0010001-24.1 4.0548 3.9611
10 Largest Event
0 0.822
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.