简体   繁体   中英

Reformat a column containing dates in Pandas

Python newbie here who's switching from R to Python for statistical modeling and analysis.

I am working with a Pandas data structure and am trying to restructure a column that contains 'date' values. In the data below, you'll notice that some values take the 'Mar-10' format which others take a '12/1/13' format. How can I restructure a column in a Pandas data structure that contains 'dates' (technically not a date structure) so that they are uniform (contain the same structure). I'd prefer that they all follow the 'Mar-10' format. Can anyone help?

In [34]: dat["Date"].unique()
Out[34]: 
array(['Jan-10', 'Feb-10', 'Mar-10', 'Apr-10', 'May-10', 'Jun-10',
       'Jul-10', 'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10',
       'Jan-11', 'Feb-11', 'Mar-11', 'Apr-11', 'May-11', 'Jun-11',
       'Jul-11', 'Aug-11', 'Sep-11', 'Oct-11', 'Nov-11', 'Dec-11',
       'Jan-12', 'Feb-12', 'Mar-12', 'Apr-12', 'May-12', 'Jun-12',
       'Jul-12', 'Aug-12', 'Sep-12', 'Oct-12', 'Nov-12', 'Dec-12',
       'Jan-13', 'Feb-13', 'Mar-13', 'Apr-13', 'May-13', '6/1/13',
       '7/1/13', '8/1/13', '9/1/13', '10/1/13', '11/1/13', '12/1/13',
       '1/1/14', '2/1/14', '3/1/14', '4/1/14', '5/1/14', '6/1/14',
       '7/1/14', '8/1/14'], dtype=object)

In [35]: isinstance(dat["Date"], basestring)  # not a string?
Out[35]: False

In [36]: type(dat["Date"]).__name__
Out[36]: 'Series'

I think your dates are already strings, try:

import numpy as np
import pandas as pd
date = pd.Series(np.array(['Jan-10', 'Feb-10', 'Mar-10', 'Apr-10', 'May-10', 'Jun-10',
       'Jul-10', 'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10',
       'Jan-11', 'Feb-11', 'Mar-11', 'Apr-11', 'May-11', 'Jun-11',
       'Jul-11', 'Aug-11', 'Sep-11', 'Oct-11', 'Nov-11', 'Dec-11',
       'Jan-12', 'Feb-12', 'Mar-12', 'Apr-12', 'May-12', 'Jun-12',
       'Jul-12', 'Aug-12', 'Sep-12', 'Oct-12', 'Nov-12', 'Dec-12',
       'Jan-13', 'Feb-13', 'Mar-13', 'Apr-13', 'May-13', '6/1/13',
       '7/1/13', '8/1/13', '9/1/13', '10/1/13', '11/1/13', '12/1/13',
       '1/1/14', '2/1/14', '3/1/14', '4/1/14', '5/1/14', '6/1/14',
       '7/1/14', '8/1/14'], dtype=object))

date.map(type).value_counts()
# date contains 56 strings
# <type 'str'>    56
# dtype: int64

To see the types of each individual element, rather than seeing the type of the column they're contained in.

Your best bet for dealing sensibly with them is to convert them into pandas DateTime objects:

pd.to_datetime(date)
Out[18]: 
0    2014-01-10
1    2014-02-10
2    2014-03-10
3    2014-04-10
4    2014-05-10
5    2014-06-10
6    2014-07-10
7    2014-08-10
8    2014-09-10
...

You may have to play around with the formats somewhat, eg creating two separate arrays for each format and then merging them back together:

# Convert the Aug-10 style strings
pd.to_datetime(date, format='%b-%y', coerce=True)
# Convert the 9/1/13 style strings
pd.to_datetime(date, format='%m/%d/%y', coerce=True)

I can never remember these time formatting codes off the top of my head but there's a good rundown of them here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM