简体   繁体   English

在熊猫中按日期字符串排序-Python 2.7

[英]Sorting by Date string in pandas - Python 2.7

I have .csv data that I want to sort by it's date column. 我有要按日期列排序的.csv数据。 My date format is of the following: 我的日期格式如下:

Week,Quarter,Year: So WK01Q12001 for example. 每周,每季度,每年:例如, WK01Q12001

When I .sort() my dataframe on this column, the resulting is sorted like: 当我在此列上对数据框进行.sort()时,结果排序如下:

WK01Q12001, WK01Q12002, WK01Q12003, WK01Q22001, WK01Q22002, WK01Q22003, ... WK02Q12001, WK02Q12002...

for example. 例如。 This makes sense because its sorting the string in ascending order. 这是有道理的,因为它以升序对字符串进行排序。

But I need my data sorted chronologically such that the result is like the following: 但是我需要按时间顺序对数据进行排序,以使结果如下所示:

WK01Q12001, WK02Q12001, WK03Q12001, WK04Q12001, ... , WK01Q22001, WK02Q22001, ... WK01Q12002, WK02Q22002 ...

How can I sort it this way using pandas? 如何使用熊猫以这种方式对其进行排序? Perhaps sorting the string in reverse? 也许对字符串进行反向排序? (right to left) or creating some kind of datetime object? (从右到左)还是创建某种日期时间对象?

I have also tried using Series() : pd.Series([pd.to_datetime(d) for d in weeklyData['Date']]) But the result is same as the above .sort() method. 我也尝试过使用Series()pd.Series([pd.to_datetime(d) for d in weeklyData['Date']])但结果与上述.sort()方法相同。

UPDATE: My DataFrame is similar in format to an excel sheet and currently looks like the following. 更新:我的DataFrame在格式上类似于Excel工作表,当前看起来如下。 I want to sort chronologically by 'Date'. 我想按日期按时间排序。

Date          Price     Volume
WK01Q12001    32        500
WK01Q12002    43        400
WK01Q12003    55        300
WK01Q12004    58        350
WK01Q22001    33        480
WK01Q22002    40        450
.
.
.
WK13Q42004    60        400

You can add a new column to your dataframe containing the date components as a list. 您可以在数据框中添加一个新列,其中包含日期成分作为列表。

eg 例如

a = ["2001", "Q2", "WK01"]
b = ["2002", "Q2", "WK01"]
c = ["2002", "Q2", "WK02"]

So, you can apply a function to your data frame to do this... 因此,您可以将函数应用于数据框以执行此操作...

def tolist(x):
    g = re.match(r"(WK\d{2})(Q\d)(\d{4})", str(x))
    return [g.group(3), g.group(2), g.group(1)]

then... 然后...

 df['datelist'] = df['Date'].apply(tolist)

which gives you your date as a list arranged in the order of importance... 这样,您就可以按照重要顺序排列的列表形式列出您的日期...

         Date  Price  Volume          datelist
0  WK01Q12001     32     500  [2001, Q1, WK01]
1  WK01Q12002     22     400  [2002, Q1, WK01]
2  WK01Q12003     42     500  [2003, Q1, WK01]

When comparing lists of equal length in Python the comparison operators behave well. 在Python中比较长度相等的列表时,比较运算符的表现良好。 So, you can use the standard DataFrame sort to order your data. 因此,您可以使用标准DataFrame排序来排序数据。

So the default sorting in a Pandas series will work correctly when you do... 因此,当您执行以下操作时,Pandas系列中的默认排序将正确运行...

df.sort('datelist')

Use str.replace to change the order of the keys first: 首先使用str.replace更改键的顺序:

s = "WK01Q12001, WK01Q12002, WK01Q12003, WK01Q22001, WK01Q22002, WK01Q22003, WK02Q12001, WK02Q12002"
date = map(str.strip, s.split(","))
df = pd.DataFrame({"date":date, "value":range(len(date))})
df["date2"] = df.date.str.replace(r"WK(\d\d)Q(\d)(\d{4})", r"\3Q\2WK\1")
df.sort("date2")

I was also able to accomplish this Date reformatting very easily using SQL. 我还能够使用SQL非常轻松地完成日期重新格式化。 When I first query my data, I did SELECT *, RIGHT([Date], 4) + SUBSTRING([Date], 5, 2) + LEFT([Date], 4) As 'SortedDate' FROM [Table] ORDER BY 'SortedDate' ASC . 当我第一次查询数据时,我以SELECT *, RIGHT([Date], 4) + SUBSTRING([Date], 5, 2) + LEFT([Date], 4) As 'SortedDate' FROM [Table] ORDER BY 'SortedDate' ASC进行了SELECT *, RIGHT([Date], 4) + SUBSTRING([Date], 5, 2) + LEFT([Date], 4) As 'SortedDate' FROM [Table] ORDER BY 'SortedDate' ASC

Use the right tool for the job! 使用正确的工具完成工作!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM