[英]Sorting by Date string in pandas - Python 2.7
I have .csv data that I want to sort by it's date column. 我有要按日期列排序的.csv数据。 My date format is of the following: 我的日期格式如下:
Week,Quarter,Year: So WK01Q12001
for example. 每周,每季度,每年:例如, WK01Q12001
。
When I .sort() my dataframe on this column, the resulting is sorted like: 当我在此列上对数据框进行.sort()时,结果排序如下:
WK01Q12001, WK01Q12002, WK01Q12003, WK01Q22001, WK01Q22002, WK01Q22003, ... WK02Q12001, WK02Q12002...
for example. 例如。 This makes sense because its sorting the string in ascending order. 这是有道理的,因为它以升序对字符串进行排序。
But I need my data sorted chronologically such that the result is like the following: 但是我需要按时间顺序对数据进行排序,以使结果如下所示:
WK01Q12001, WK02Q12001, WK03Q12001, WK04Q12001, ... , WK01Q22001, WK02Q22001, ... WK01Q12002, WK02Q22002 ...
How can I sort it this way using pandas? 如何使用熊猫以这种方式对其进行排序? Perhaps sorting the string in reverse? 也许对字符串进行反向排序? (right to left) or creating some kind of datetime object? (从右到左)还是创建某种日期时间对象?
I have also tried using Series()
: pd.Series([pd.to_datetime(d) for d in weeklyData['Date']])
But the result is same as the above .sort()
method. 我也尝试过使用Series()
: pd.Series([pd.to_datetime(d) for d in weeklyData['Date']])
但结果与上述.sort()
方法相同。
UPDATE: My DataFrame is similar in format to an excel sheet and currently looks like the following. 更新:我的DataFrame在格式上类似于Excel工作表,当前看起来如下。 I want to sort chronologically by 'Date'. 我想按日期按时间排序。
Date Price Volume
WK01Q12001 32 500
WK01Q12002 43 400
WK01Q12003 55 300
WK01Q12004 58 350
WK01Q22001 33 480
WK01Q22002 40 450
.
.
.
WK13Q42004 60 400
You can add a new column to your dataframe containing the date components as a list. 您可以在数据框中添加一个新列,其中包含日期成分作为列表。
eg 例如
a = ["2001", "Q2", "WK01"]
b = ["2002", "Q2", "WK01"]
c = ["2002", "Q2", "WK02"]
So, you can apply a function to your data frame to do this... 因此,您可以将函数应用于数据框以执行此操作...
def tolist(x):
g = re.match(r"(WK\d{2})(Q\d)(\d{4})", str(x))
return [g.group(3), g.group(2), g.group(1)]
then... 然后...
df['datelist'] = df['Date'].apply(tolist)
which gives you your date as a list arranged in the order of importance... 这样,您就可以按照重要顺序排列的列表形式列出您的日期...
Date Price Volume datelist
0 WK01Q12001 32 500 [2001, Q1, WK01]
1 WK01Q12002 22 400 [2002, Q1, WK01]
2 WK01Q12003 42 500 [2003, Q1, WK01]
When comparing lists of equal length in Python the comparison operators behave well. 在Python中比较长度相等的列表时,比较运算符的表现良好。 So, you can use the standard DataFrame sort to order your data. 因此,您可以使用标准DataFrame排序来排序数据。
So the default sorting in a Pandas series will work correctly when you do... 因此,当您执行以下操作时,Pandas系列中的默认排序将正确运行...
df.sort('datelist')
Use str.replace
to change the order of the keys first: 首先使用str.replace
更改键的顺序:
s = "WK01Q12001, WK01Q12002, WK01Q12003, WK01Q22001, WK01Q22002, WK01Q22003, WK02Q12001, WK02Q12002"
date = map(str.strip, s.split(","))
df = pd.DataFrame({"date":date, "value":range(len(date))})
df["date2"] = df.date.str.replace(r"WK(\d\d)Q(\d)(\d{4})", r"\3Q\2WK\1")
df.sort("date2")
I was also able to accomplish this Date reformatting very easily using SQL. 我还能够使用SQL非常轻松地完成日期重新格式化。 When I first query my data, I did SELECT *, RIGHT([Date], 4) + SUBSTRING([Date], 5, 2) + LEFT([Date], 4) As 'SortedDate' FROM [Table] ORDER BY 'SortedDate' ASC
. 当我第一次查询数据时,我以SELECT *, RIGHT([Date], 4) + SUBSTRING([Date], 5, 2) + LEFT([Date], 4) As 'SortedDate' FROM [Table] ORDER BY 'SortedDate' ASC
进行了SELECT *, RIGHT([Date], 4) + SUBSTRING([Date], 5, 2) + LEFT([Date], 4) As 'SortedDate' FROM [Table] ORDER BY 'SortedDate' ASC
。
Use the right tool for the job! 使用正确的工具完成工作!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.