pandas.read_csv：如何在分层索引的 CSV 中将两列解析为日期时间？

Question

I have a CSV file that, simplified, looks like this:我有一个 CSV 文件，简化后如下所示：

X,,Y,,Z,
Date,Time,A,B,A,B
2017-01-21,01:57:49.390,0,1,2,3
2017-01-21,01:57:50.400,4,5,7,9
2017-01-21,01:57:51.410,3,2,4,1

The first two columns are date and time.前两列是日期和时间。 When I do"当我做”

pandas.read_csv('foo.csv', header=[0,1])

I get the following DataFrame:我得到以下数据帧：

            X Unnamed: 1_level_0  Y Unnamed: 3_level_0  Z Unnamed: 5_level_0
         Date               Time  A                  B  A                  B
0  2017-01-21       01:57:49.390  0                  1  2                  3
1  2017-01-21       01:57:50.400  4                  5  7                  9
2  2017-01-21       01:57:51.410  3                  2  4                  1

Ignoring the annoying unnamed entries in the columns for now, I'd like to combine the first two columns into a single datetime.暂时忽略列中烦人的未命名条目，我想将前两列合并为一个日期时间。 So I tried using the parse_dates argument:所以我尝试使用 parse_dates 参数：

pandas.read_csv('foo.csv', header=[0,1], parse_dates={'datetime': [0,1]})

But all I get from this is a traceback:但我从中得到的只是一个追溯：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 401, in _read
    data = parser.read()
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 939, in read
    ret = self._engine.read(nrows)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1585, in read
    names, data = self._do_date_conversions(names, data)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1364, in _do_date_conversions
    self.index_names, names, keep_date_col=self.keep_date_col)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 2737, in _process_date_conversion
    data_dict.pop(c)
KeyError: "('X', 'Date')"

I'm not sure why it's hitting a KeyError on ('X', 'Date') , since those are definitely present in the columns.我不确定为什么它会在('X', 'Date')上遇到KeyError ，因为这些肯定存在于列中。 I don't really know if this is a bug in pandas that I should report (I'm using 0.19.2), or if I'm just not understanding something.我真的不知道这是否是我应该报告的pandas中的错误（我使用的是 0.19.2），或者我只是不理解某些东西。 Any ideas?有任何想法吗？

Answer 1

You can work around if needed:如果需要，您可以解决：

import datetime as dt
import pandas as pd

# read in the csv file
df = pd.read_csv('foo.csv', header=[0, 1])

# get a label for the funky column names
date_label, time_label = tuple(df.columns.values)[0:2]

# merge the columns into a single datetime
dates = [
    dt.datetime.strptime('T'.join(ts) + '000', '%Y-%m-%dT%H:%M:%S.%f')
    for ts in zip(df[date_label], df[time_label])]

# save the new column
df['DateTime'] = pd.Series(dates).values

Update:更新：

I have submitted a bug and a pull request for this issue.我已针对此问题提交了错误和拉取请求。 In response to the bug, jreback (pandas lead maintainer) gave a fairly detailed response about issues with the multi-level header from the example. 针对该错误， jreback （pandas 主要维护者）对示例中的多级标头问题给出了相当详细的答复。 I think you are already aware of these issues, but you may want to read what he wrote.我认为您已经意识到这些问题，但您可能想阅读他写的内容。 At the end of the response he had this bit that may provide a work around:在回复的最后，他有一点可以提供解决方法：

Making a single level is just not useful in a multi-level frame.制作单个关卡在多层次框架中是没有用的。 I would probably do this:我可能会这样做：

In [25]: pandas.read_csv(StringIO(data), header=0, skiprows=1, parse_dates={'datetime':[0,1]})
Out[25]: 
                 datetime  A  B  A.1  B.1
0 2017-01-21 01:57:49.390  0  1    2    3
1 2017-01-21 01:57:50.400  4  5    7    9
2 2017-01-21 01:57:51.410  3  2    4    1

pandas.read_csv：如何在分层索引的 CSV 中将两列解析为日期时间？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-02-12 07:15:19

pandas.read_csv：如何在分层索引的 CSV 中将两列解析为日期时间？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-02-12 07:15:19

解决方案1
1 已采纳 2017-02-12 07:15:19