简体   繁体   English

重新格式化Pandas中包含日期的列

[英]Reformat a column containing dates in Pandas

Python newbie here who's switching from R to Python for statistical modeling and analysis. 此处的Python新手将从R切换到Python,以进行统计建模和分析。

I am working with a Pandas data structure and am trying to restructure a column that contains 'date' values. 我正在使用Pandas数据结构,并试图重组包含“日期”值的列。 In the data below, you'll notice that some values take the 'Mar-10' format which others take a '12/1/13' format. 在下面的数据中,您会注意到一些值采用“ Mar-10”格式,而其他值则采用“ 12/1/13”格式。 How can I restructure a column in a Pandas data structure that contains 'dates' (technically not a date structure) so that they are uniform (contain the same structure). 如何在Pandas数据结构中重构包含“日期”的列(技术上不是日期结构),以使它们统一(包含相同的结构)。 I'd prefer that they all follow the 'Mar-10' format. 我希望它们都遵循“ Mar-10”格式。 Can anyone help? 有人可以帮忙吗?

In [34]: dat["Date"].unique()
Out[34]: 
array(['Jan-10', 'Feb-10', 'Mar-10', 'Apr-10', 'May-10', 'Jun-10',
       'Jul-10', 'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10',
       'Jan-11', 'Feb-11', 'Mar-11', 'Apr-11', 'May-11', 'Jun-11',
       'Jul-11', 'Aug-11', 'Sep-11', 'Oct-11', 'Nov-11', 'Dec-11',
       'Jan-12', 'Feb-12', 'Mar-12', 'Apr-12', 'May-12', 'Jun-12',
       'Jul-12', 'Aug-12', 'Sep-12', 'Oct-12', 'Nov-12', 'Dec-12',
       'Jan-13', 'Feb-13', 'Mar-13', 'Apr-13', 'May-13', '6/1/13',
       '7/1/13', '8/1/13', '9/1/13', '10/1/13', '11/1/13', '12/1/13',
       '1/1/14', '2/1/14', '3/1/14', '4/1/14', '5/1/14', '6/1/14',
       '7/1/14', '8/1/14'], dtype=object)

In [35]: isinstance(dat["Date"], basestring)  # not a string?
Out[35]: False

In [36]: type(dat["Date"]).__name__
Out[36]: 'Series'

I think your dates are already strings, try: 我认为您的日期已经是字符串,请尝试:

import numpy as np
import pandas as pd
date = pd.Series(np.array(['Jan-10', 'Feb-10', 'Mar-10', 'Apr-10', 'May-10', 'Jun-10',
       'Jul-10', 'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10',
       'Jan-11', 'Feb-11', 'Mar-11', 'Apr-11', 'May-11', 'Jun-11',
       'Jul-11', 'Aug-11', 'Sep-11', 'Oct-11', 'Nov-11', 'Dec-11',
       'Jan-12', 'Feb-12', 'Mar-12', 'Apr-12', 'May-12', 'Jun-12',
       'Jul-12', 'Aug-12', 'Sep-12', 'Oct-12', 'Nov-12', 'Dec-12',
       'Jan-13', 'Feb-13', 'Mar-13', 'Apr-13', 'May-13', '6/1/13',
       '7/1/13', '8/1/13', '9/1/13', '10/1/13', '11/1/13', '12/1/13',
       '1/1/14', '2/1/14', '3/1/14', '4/1/14', '5/1/14', '6/1/14',
       '7/1/14', '8/1/14'], dtype=object))

date.map(type).value_counts()
# date contains 56 strings
# <type 'str'>    56
# dtype: int64

To see the types of each individual element, rather than seeing the type of the column they're contained in. 查看每个元素的类型,而不是查看它们所包含的列的类型。

Your best bet for dealing sensibly with them is to convert them into pandas DateTime objects: 明智地处理它们的最佳选择是将它们转换为熊猫DateTime对象:

pd.to_datetime(date)
Out[18]: 
0    2014-01-10
1    2014-02-10
2    2014-03-10
3    2014-04-10
4    2014-05-10
5    2014-06-10
6    2014-07-10
7    2014-08-10
8    2014-09-10
...

You may have to play around with the formats somewhat, eg creating two separate arrays for each format and then merging them back together: 您可能需要对格式进行一些尝试,例如,为每种格式创建两个单独的数组,然后将它们合并在一起:

# Convert the Aug-10 style strings
pd.to_datetime(date, format='%b-%y', coerce=True)
# Convert the 9/1/13 style strings
pd.to_datetime(date, format='%m/%d/%y', coerce=True)

I can never remember these time formatting codes off the top of my head but there's a good rundown of them here . 我永远记得这些时间格式化代码从我的头顶,但有他们的一个很好的破败这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM