简体   繁体   English

当日期和时间在单独的列中时,将数据从csv读入pandas

[英]Reading data from csv into pandas when date and time are in separate columns

I looked at the answer to this question: Parse dates when YYYYMMDD and HH are in separate columns using pandas in Python , but it doesn't seem to work for me, which makes me think I'm doing something subtley wrong. 我查看了这个问题的答案: 在Python中使用pandas将YYYYMMDD和HH放在不同的列中时解析日期 ,但它似乎对我不起作用,这让我觉得我做了一些微妙的错误。

I've got data in .csv files, which I'm trying to read using the pandas read_csv function. 我有.csv文件中的数据,我正在尝试使用pandas read_csv函数读取。 Date and time are in two separate columns, but I want to merge in them into one column, "Datetime", containing datetime objects. 日期和时间分为两列,但我想将它们合并到一个包含datetime对象的列“Datetime”中。 The csv looks like this: csv看起来像这样:

    Note about the data
    blank line
    Site Id,Date,Time,WTEQ.I-1...
    2069, 2008-01-19, 06:00, -99.9...
    2069, 2008-01-19, 07:00, -99.9...
    ...

I'm trying to read it using this line of code: 我正在尝试使用以下代码行阅读它:

   read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, date_parser=True, na_values=["-99.9"])

However, when I write it back out to a csv, it looks exactly the same (except that the -99.9s are changed to NA, like I specified with the na_values argument). 但是,当我把它写回到csv时,它看起来完全一样(除了-99.9s被改为NA,就像我用na_values参数指定的那样)。 Date and time are in two separate columns. 日期和时间分为两列。 As I understand it, this should be creating a new column called Datetime that is composed of columns 1 and 2, parsed using the date_parser. 据我了解,这应该是创建一个名为Datetime的新列,它由第1列和第2列组成,使用date_parser进行解析。 I have also tried using parse_dates={"Datetime" : ["Date","Time"]}, parse_dates=[[1,2]], and parse_dates=[["Date", "Time"]]. 我也尝试过使用parse_dates = {“Datetime”:[“Date”,“Time”]},parse_dates = [[1,2]]和parse_dates = [[“Date”,“Time”]]。 I have also tried using date_parser=parse, where parse is defined as: 我也尝试使用date_parser = parse,其中parse定义为:

    parse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M')

None of these has made the least bit of difference, which makes me suspect that there's some deeper problem. 这些都没有造成一点点差别,这使我怀疑存在一些更深层次的问题。 Any insight into what it might be? 任何洞察它可能是什么?

You should update your pandas, I recommend the latest stable version for the latest features and bug fixes. 您应该更新您的熊猫,我推荐最新的稳定版本以获取最新功能和错误修复。

This specific feature was introduced in 0.8.0 , and works on pandas version 0.11: 此特定功能在0.8.0引入 ,适用于pandas版本0.11:

In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, na_values=["-99.9"])
Out[11]:
             Datetime  Site Id  WTEQ.I-1
0 2008-01-19 06:00:00     2069       NaN
1 2008-01-19 07:00:00     2069       NaN

without the date_parser=True (since this should be a parsing function, see docstring ). 没有date_parser=True (因为这应该是一个解析函数,请参阅docstring )。

Note that in the provided example the resulting "Datetime" column is a Series of its own and not the index values of the DataFrame. 请注意,在提供的示例中,生成的“Datetime”列是其自己的Series,而不是DataFrame的索引值。 If you'd rather want to have the datetime values as index column rather than the integer value pass the index_col argument specifying the desired column, in this case 0 since the resulting "Datetime" column is the first one. 如果您希望将datetime值作为索引列而不是整数值,则传递指定所需列的index_col参数,在本例中为0,因为生成的“Datetime”列是第一个。

In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, index_col=0, na_values=["-99.9"])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM