简体   繁体   English

熊猫,在保留订单的同时进行分类

[英]Pandas, sorting days whilst preserving order

I've received a CSV file that is a combination of several other csv files. 我收到了一个CSV文件,它是其他几个csv文件的组合。

It has a datetime index (in the format of '2017-01-16' , year, month, day) However, two problems arise. 它具有日期时间索引(格式为'2017-01-16',年,月,日),但是会出现两个问题。

  1. The combination was not done in order. 合并未按顺序进行。

      Date string number (different) 1 2017-01-16 stringvalue 90 2 2017-01-16 stringvalue 912 3 2017-01-16 stringvalue 29 4 2017-01-17 stringvalue 883 5 2017-01-17 stringvalue 223 6 2017-01-17 stringvalue 211 (...) 230 2015-04-30 stringvalue 908 231 2015-04-29 stringvalue 28 232 2015-04-29 stringvalue 9 233 2015-04-30 stringvalue 98 234 2015-04-30 stringvalue 909 (...) 450 2017-03-30 stringvalue 348 
  2. No time has been provided (the actual day is the smallest number, yet each days holds around 10 values, that need to be kept in order) 没有提供任何时间(实际的日期是最小的数字,但是每天都有大约10个值,需要保持顺序)

I resolved the first problem by peforming 我通过执行解决了第一个问题

    df = df.reset_index()
    df = df.sort_values('Date')
    df = df.set_index('Date')

This correctly orders the index, but messes up the ordering within each day. 这样可以正确地对索引进行排序,但是会导致每天混乱。 Is there a way to sort the dates, but keep the original order within the days intact? 有没有一种方法可以对日期进行排序,但在几天之内保持原始顺序不变?

By using a new para and prefix the original order : 通过使用新的para并为原始顺序添加前缀:

df['G']=df.groupby(level='Date').cumcount()
df
Out[125]: 
                 string  number  G
Date                              
2017-01-16  stringvalue      90  0
2017-01-16  stringvalue     912  1
2017-01-16  stringvalue      29  2
2017-01-17  stringvalue     883  0
2017-01-17  stringvalue     223  1
2017-01-17  stringvalue     211  2
2015-04-30  stringvalue     908  0
2017-03-30  stringvalue     348  0

df.sort_values('G').sort_index().drop('G',1)
Out[124]: 
                 string  number
Date                           
2015-04-30  stringvalue     908
2017-01-16  stringvalue      90
2017-01-16  stringvalue     912
2017-01-16  stringvalue      29
2017-01-17  stringvalue     883
2017-01-17  stringvalue     223
2017-01-17  stringvalue     211
2017-03-30  stringvalue     348
df['Date'] = pd.to_datetime(df.Date)

first convert the Date column to datetime type if needed 第一转换Datedatetime ,如果需要的类型

df = df.reset_index().sort_values(by=['Date', 'index']).drop(['index'], axis=1)

This will reset the index, creating a temporary column called index . 这将重置索引,并创建一个名为index的临时列。 Then sort using both the Date & index columns, and finally drop the index column. 然后使用“ Dateindex列进行排序,最后删除index列。 leaving the data frame sorted by Date and the order in which they appeared in the original CSV file. 使数据框按Date及其在原始CSV文件中的显示顺序进行排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM