简体   繁体   中英

Pandas sort on Date in DataFrame not sorting properly

The following code segment reads in sample_data.csv into a DataFrame and sorts by observation_time . The sort() routine doesn't properly sort the date/time.

df = pd.read_csv('C:/data/sample_data.csv')
df = df.sort(['observation_time'])
df.to_csv('C:/data/outfile.csv')

[using Pandas 0.12 from Anaconda 32-bit windows]

Input sample_data.csv :

latitude    longitude   vessel_name observation_time
1.031000018 -79.68883514    aqasi           12/28/2012 10:40
1.032833338 -79.71916199    aqasi           12/29/2012 14:06
1.486500025 -80.1906662     aqasi           12/31/2012 4:41
1.466999888 -80.16249847    aqasi           12/31/2012 4:30
2.342833519 -81.46682739    aqasi           12/31/2012 13:40
2.360000134 -81.4936676     aqasi           12/31/2012 13:51
3.816000223 -83.68183899    aqasi           1/1/2013 5:20
3.730499983 -83.55400085    aqasi           1/1/2013 4:24
3.714666843 -83.53016663    aqasi           1/1/2013 4:14
4.986999989 -85.45566559    aqasi           1/1/2013 19:04
6.884333134 -88.21949768    aqasi           1/2/2013 13:11
6.885833263 -88.22200012    aqasi           1/2/2013 13:12
6.886833191 -88.22383881    aqasi           1/2/2013 13:12
6.887333393 -88.22450256    aqasi           1/2/2013 13:13
6.889333248 -88.22800446    aqasi           1/2/2013 13:14

Output outfile.csv :

    latitude    longitude   vessel_name observation_time
9   4.986999989 -85.45566559    aqasi           1/1/2013 19:04
8   3.714666843 -83.53016663    aqasi           1/1/2013 4:14
7   3.730499983 -83.55400085    aqasi           1/1/2013 4:24
6   3.816000223 -83.68183899    aqasi           1/1/2013 5:20
10  6.884333134 -88.21949768    aqasi           1/2/2013 13:11
12  6.886833191 -88.22383881    aqasi           1/2/2013 13:12
11  6.885833263 -88.22200012    aqasi           1/2/2013 13:12
13  6.887333393 -88.22450256    aqasi           1/2/2013 13:13
14  6.889333248 -88.22800446    aqasi           1/2/2013 13:14
0   1.031000018 -79.68883514    aqasi           12/28/2012 10:40
1   1.032833338 -79.71916199    aqasi           12/29/2012 14:06
4   2.342833519 -81.46682739    aqasi           12/31/2012 13:40
5   2.360000134 -81.4936676     aqasi           12/31/2012 13:51
3   1.466999888 -80.16249847    aqasi           12/31/2012 4:30
2   1.486500025 -80.1906662     aqasi           12/31/2012 4:41

The reason for the 'wrong' sorting is that the observation_time column is a string type column, so it is in fact correctly sorted as string .

Converting it to datetime type before sorting will produce the desired result:

df['observation_time'] = pd.to_datetime(df['observation_time'])

Disclaimer: In the spirit of this meta post , I've explicitly added the solution from this and this comment above as an answer, since people tend not to read comments and answers are more permanent (and have the benefit that they can be upvoted/downvoted to indicate usefulness).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM