Pandas to_datetime with multiindex

Question

how can I drop a level in multi-indexed columns when converting three columns to datetime? 将三列转换为datetime时，如何在多索引列中删除一个级别？ Below example only contains three columns while in my dateframe there are more columns, of course, and those other columns use two level names. 下面的示例只包含三列，而我的日期框架中有更多列，当然，其他列使用两个级别名称。

    >>> import pandas as pd
    >>> df = pd.DataFrame([[2010, 1, 2],[2011,1,3],[2012,2,3]])
    >>> df.columns = [['year', 'month', 'day'],['y', 'm', 'd']]
    >>> print(df)
       year month day
          y     m   d
    0  2010     1   2
    1  2011     1   3
    2  2012     2   3
    >>> pd.to_datetime(df[['year', 'month', 'day']])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 512, in to_datetime
    result = _assemble_from_unit_mappings(arg, errors=errors)
  File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 582, in _assemble_from_unit_mappings
    unit = {k: f(k) for k in arg.keys()}
  File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 582, in <dictcomp>
    unit = {k: f(k) for k in arg.keys()}
  File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 577, in f
    if value.lower() in _unit_map:
AttributeError: 'tuple' object has no attribute 'lower'

Edit: Add more columns to explain better: 编辑：添加更多列以更好地解释：

>>> df = pd.DataFrame([[2010, 1, 2, 10, 2],[2011,1,3,11,3],[2012,2,3,12,2]])
>>> df.columns = [['year', 'month', 'day', 'temp', 'wind_speed'],['', '', '', 'degc','m/s']]
>>> print(df)
   year month day temp wind_speed
                  degc        m/s
0  2010     1   2   10          2
1  2011     1   3   11          3
2  2012     2   3   12          2

What I need is to combine first three columns to datetime index, leaving two last columns with data. 我需要的是将前三列组合到日期时间索引，留下最后两列数据。

Answer 1

Use droplevel for remove second level: 使用droplevel删除第二级：

df.columns = df.columns.droplevel(1)
df = pd.to_datetime(df[['year', 'month', 'day']])
print (df)
0   2010-01-02
1   2011-01-03
2   2012-02-03
dtype: datetime64[ns]

If only 3 columns : 如果只有3 columns ：

df.columns = df.columns.droplevel(1)
df = pd.to_datetime(df)
print (df)

0   2010-01-02
1   2011-01-03
2   2012-02-03
dtype: datetime64[ns]

If more columns: 如果有更多列：

df = pd.DataFrame([[2010, 1, 2,3],[2011,1,3,5],[2012,2,3,7]])
df.columns = [['year', 'month', 'day','a'],['y', 'm', 'd', 'b']]
print(df)
   year month day  a
      y     m   d  b
0  2010     1   2  3
1  2011     1   3  5
2  2012     2   3  7

#select datetime columns only
df1 = df[['year', 'month', 'day']]
df1.columns = df1.columns.droplevel(1)
print (df1)
   year  month  day
0  2010      1    2
1  2011      1    3
2  2012      2    3

#convert to Series
s1 = pd.to_datetime(df1)
#set new MultiIndex 
s1.name=('date','dat')
print (s1)
0   2010-01-02
1   2011-01-03
2   2012-02-03
Name: (date, dat), dtype: datetime64[ns]

#remove original columns and add new datetime Series
df = df.drop(['year', 'month', 'day'], axis=1, level=0).join(s1)
print (df)
   a       date
   b        dat
0  3 2010-01-02
1  5 2011-01-03
2  7 2012-02-03

Another solution with transpose, should be slowier in big DataFrame: 使用转置的另一个解决方案，在大型DataFrame中应该更慢：

df1 = df[['year', 'month', 'day']]
s1 =  pd.to_datetime(df1.T.reset_index(drop=True, level=1).T).rename(('date', 'dat'))
print (s1)
0   2010-01-02
1   2011-01-03
2   2012-02-03
Name: (date, dat), dtype: datetime64[ns]

df1 = df.join(s1)
print (df1)
   year month day temp wind_speed       date
                  degc        m/s        dat
0  2010     1   2   10          2 2010-01-02
1  2011     1   3   11          3 2011-01-03
2  2012     2   3   12          2 2012-02-03

Pandas to_datetime with multiindex

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-10-15 13:44:35

Pandas to_datetime with multiindex

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-10-15 13:44:35

解决方案1
3 已采纳 2017-10-15 13:44:35