[英]Pandas, sorting days whilst preserving order
I've received a CSV file that is a combination of several other csv files. 我收到了一个CSV文件,它是其他几个csv文件的组合。
It has a datetime index (in the format of '2017-01-16' , year, month, day) However, two problems arise. 它具有日期时间索引(格式为'2017-01-16',年,月,日),但是会出现两个问题。
The combination was not done in order. 合并未按顺序进行。
Date string number (different) 1 2017-01-16 stringvalue 90 2 2017-01-16 stringvalue 912 3 2017-01-16 stringvalue 29 4 2017-01-17 stringvalue 883 5 2017-01-17 stringvalue 223 6 2017-01-17 stringvalue 211 (...) 230 2015-04-30 stringvalue 908 231 2015-04-29 stringvalue 28 232 2015-04-29 stringvalue 9 233 2015-04-30 stringvalue 98 234 2015-04-30 stringvalue 909 (...) 450 2017-03-30 stringvalue 348
No time has been provided (the actual day is the smallest number, yet each days holds around 10 values, that need to be kept in order) 没有提供任何时间(实际的日期是最小的数字,但是每天都有大约10个值,需要保持顺序)
I resolved the first problem by peforming 我通过执行解决了第一个问题
df = df.reset_index()
df = df.sort_values('Date')
df = df.set_index('Date')
This correctly orders the index, but messes up the ordering within each day. 这样可以正确地对索引进行排序,但是会导致每天混乱。 Is there a way to sort the dates, but keep the original order within the days intact? 有没有一种方法可以对日期进行排序,但在几天之内保持原始顺序不变?
By using a new para and prefix the original order : 通过使用新的para并为原始顺序添加前缀:
df['G']=df.groupby(level='Date').cumcount()
df
Out[125]:
string number G
Date
2017-01-16 stringvalue 90 0
2017-01-16 stringvalue 912 1
2017-01-16 stringvalue 29 2
2017-01-17 stringvalue 883 0
2017-01-17 stringvalue 223 1
2017-01-17 stringvalue 211 2
2015-04-30 stringvalue 908 0
2017-03-30 stringvalue 348 0
df.sort_values('G').sort_index().drop('G',1)
Out[124]:
string number
Date
2015-04-30 stringvalue 908
2017-01-16 stringvalue 90
2017-01-16 stringvalue 912
2017-01-16 stringvalue 29
2017-01-17 stringvalue 883
2017-01-17 stringvalue 223
2017-01-17 stringvalue 211
2017-03-30 stringvalue 348
df['Date'] = pd.to_datetime(df.Date)
first convert the Date
column to datetime
type if needed 第一转换Date
列datetime
,如果需要的类型
df = df.reset_index().sort_values(by=['Date', 'index']).drop(['index'], axis=1)
This will reset the index, creating a temporary column called index
. 这将重置索引,并创建一个名为index
的临时列。 Then sort using both the Date
& index
columns, and finally drop the index
column. 然后使用“ Date
和index
列进行排序,最后删除index
列。 leaving the data frame sorted by Date
and the order in which they appeared in the original CSV file. 使数据框按Date
及其在原始CSV文件中的显示顺序进行排序。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.