简体   繁体   English

按日期对 Pandas dataframe 列索引进行排序

[英]Sort Pandas dataframe column index by date

I want to sort dataframe by column index.我想按列索引对 dataframe 进行排序。 The issue is my columns are 'dates' dd/mm/yyyy directly imported from my excel.问题是我的列是直接从我的 excel 导入的“日期”dd/mm/yyyy。 For ex:例如:

    10/08/20  12/08/20 11/08/20
0   2.0        6.0       15.0
1   6.0        11.0      8.0
2   4.0        7.0       3.0
3   7.0        12.0      2.0
4   12.0       5.0       7.0

The output I want is:我想要的 output 是:

    10/08/20  11/08/20 12/08/20
0   2.0        15.0      6.0
1   6.0        8.0       11.0
2   4.0        3.0       7.0
3   7.0        2.0       12.0
4   12.0       7.0       5.0

I am using我在用

df.sort_index(axis=1)

It is giving me following error:它给了我以下错误:

TypeError: '<' not supported between instances of 'datetime.datetime' and 'str' TypeError: 'datetime.datetime' 和 'str' 的实例之间不支持'<'

I want to do it in panda dataframe.我想在熊猫 dataframe 中做到这一点。 Any help will be appreciated.任何帮助将不胜感激。 Thanks谢谢

First remove the '.'首先删除“。” at the end of date from the data shource sheet.在数据源表的日期结束时。 the for this data对于这个数据

    10-08-2020  12-08-2020  11-08-2020
0   2           6           15
1   6           11          8
2   4           7           3
3   7           12          2
4   12          5           7

try this尝试这个

import datetime as dt
df.columns=pd.Series(df.columns).apply(lambda d: dt.datetime(d, dt.datetime.strptime(d, '%d/%m/%Y')))
df.sort_index(axis = 1)

First:第一的:

df.columns = df.columns.str.replace(".", "")

Then:然后:

df.sort_index(axis = 1)

Update : as Ch3steR mentioned in the comments.更新:正如评论中提到的 Ch3steR。 For removing "."用于删除“。”

df.columns = df.columns.str.rstrip(".")

Using str.rstrip to generalize, as day.month.year is valid format, using str.replace would replace every .使用str.rstrip进行概括,因为day.month.year是有效格式,使用str.replace将替换每个.

Example:例子:

s = pd.Series(["1.2.2020."])
pd.to_datetime(s.str.replace('.', ''))
# 0   2020-12-20         # Interpeted wrong
# dtype: datetime64[ns]

pd.to_datetime(s.str.rstrip('.'))
# 0   2020-01-02
# dtype: datetime64[ns]

Your error comes from the fact that you are mixing string types with date types .您的错误来自您将字符串类型与日期类型混合的事实。 Either all your column names are strings or all are dates but you cannot have both mixed.您的所有列名都是字符串,或者都是日期,但您不能同时使用两者。

For example例如

l=[[2.0, 6.0, 15.0],
   [6.0, 11.0, 8.0],
   [4.0, 7.0, 3.0],
   [7.0, 12.0, 2.0],
   [12.0, 5.0, 7.0]]

d = pd.DataFrame(l, columns =['10/08/20',  '12/08/20', '11/08/20']) # column names are strings

yields产量

   10/08/20  12/08/20  11/08/20
0       2.0       6.0      15.0
1       6.0      11.0       8.0
2       4.0       7.0       3.0
3       7.0      12.0       2.0
4      12.0       5.0       7.0

Now if I want to sort by column names I type现在,如果我想按我输入的列名排序

d.sort_index(axis = 1)

   10/08/20  11/08/20  12/08/20
0       2.0      15.0       6.0
1       6.0       8.0      11.0
2       4.0       3.0       7.0
3       7.0       2.0      12.0
4      12.0       7.0       5.0

If on the other hand, column names were dates as in另一方面,如果列名是日期,如

from dateutil.parser import parse
d = pd.DataFrame(l, columns =[parse('10/08/20'),  parse('12/08/20'), parse('11/08/20')])

we will have我们将有

   2020-10-08  2020-12-08  2020-11-08   #now column names are dates
0         2.0         6.0        15.0
1         6.0        11.0         8.0
2         4.0         7.0         3.0
3         7.0        12.0         2.0
4        12.0         5.0         7.0

Again you can sort them using the same同样,您可以使用相同的方法对它们进行排序

details.sort_index(axis = 1)
  
   2020-10-08  2020-11-08  2020-12-08
0         2.0        15.0         6.0
1         6.0         8.0        11.0
2         4.0         3.0         7.0
3         7.0         2.0        12.0
4        12.0         7.0         5.0

and will give you no error.并且不会给你任何错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM