简体   繁体   中英

Set multilevel index changes date to datetime in pandas dataframe

I have a pandasDataFrame containing a datetime.date column. When I set a multilevel index, the date column is converted to a datetime.datetime object, which does not happen when setting a single-level index. Is this normal behavior? How can I define a multilevel index keeping the date type?

import datetime
import pandas as pd
values = [("a", datetime.date(2015,1,1), 30.),                                                                                                                                   
          ("a", datetime.date(2015,1,2), 25.)]                                                                                                                                   
columns = ["id", "date", "amount"]                                                                                                              
df = pd.DataFrame(values, columns=columns)                                                                                                                                                         
df_single = df.set_index("date")
df_multi = df.set_index(["id", "date"])   

Here is the output:

print(df_multi.index)
# MultiIndex(levels=[['a'], [2015-01-01 00:00:00, 2015-01-02 00:00:00]],
#            labels=[[0, 0], [0, 1]],
#            names=['id', 'date'])

print(df_single.index)
# Index([2015-01-01, 2015-01-02], dtype='object', name='date')    

For information, I'm using the following versions:

  • Python 3.4.5 |Anaconda 2.3.0
  • pandas==0.19.2

Let's start with your second question:

How can I define a multilevel index keeping the date type?

Workaround:

It is possible to replace part of an index. So in your example, after applying the multi index, the datetime can be replaced with a date like:

df_multi.index.set_levels([df['date'].values], level=[1], inplace=True)

Workaround Result:

>>> print(df_multi.index)
MultiIndex(levels=[[u'a'], [2015-01-01, 2015-01-02]],
           labels=[[0, 0], [0, 1]],
           names=[u'id', u'date'])

Why?

To your first question:

Is this a normal behavior?

Well this is normal, in that the code definitely does this. This behavior is a side effect of pandas.core.categorical.Categorical() which ends up promoting the date to a datetime64 via:

values = _possibly_infer_to_datetimelike(values, convert_dates=True)

I do not know if the effect you are seeing is by design or not, but you could open an issue here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM