简体   繁体   English

Python Pandas日期时间和多索引问题

[英]Python Pandas datetime and multiindex issue

I have a Python script. 我有一个Python脚本。 After running various commands to import, transpose and process data from a CSV file, I end up with a dataframe that looks like this: 运行各种命令以从CSV文件导入,转置和处理数据后,我得到的数据框如下所示:

        PV          PV
Date    30/11/2016  01/12/2016 
00:30   4           4
01:00   5           1
01:30   6           7
etc

What I want now is to remove the column for 30/11/2016, leaving only the data for 01/12/2016. 我现在想要的是删除2016年11月30日的列,仅保留01/12/2016的数据。 This is the code I have: 这是我的代码:

# create MultiIndex.from_arrays from first row of DataFrame first, then remove first row 
# by df.iloc
df.columns = pd.MultiIndex.from_arrays([df.columns, pd.to_datetime(df.iloc[0])])
df = df.iloc[1:]

# get today's date minus 60 mins. the minus 60 mins will account for the fact that the
# very last half hourly data slot is produced at the beginning of the next day
date = dt.datetime.today() - dt.timedelta(minutes=60)

# convert to correct format:
date = date.strftime("%d-%m-%Y")

# Use indexslice to remove unwanted date columns i.e. none that are not for today's 
# date
idx = pd.IndexSlice
df = df.loc[:,idx[:,[date]]]

# drop the second level of the multiindex, which is the level containing the date, which 
# is no longer required
df.columns = df.columns.droplevel(1)

This was working fine for the whole of November until today, the 1st December, when it started throwing up errors. 在整个11月,直到今天,即12月1日,它开始抛出错误时,都运行良好。 What I've traced it to is the first section of code ie: 我追踪到的是代码的第一部分,即:

# create MultiIndex.from_arrays from first row of DataFrame first, then remove first row 
# by df.iloc
df.columns = pd.MultiIndex.from_arrays([df.columns, pd.to_datetime(df.iloc[0])])

The output of which is: 输出为:

        PV         
Date    2016-11-30  2016-01-12
Date    30/11/2016  01/12/2016 
00:30   4           4
01:00   5           1
01:30   6           7
etc

The problem is in the first set of dates shown above, the first of which is 2016-11-30, therefore YMD, the second is 2016-01-12, therefore YDM. 问题出在上面显示的第一组日期中,第一组是2016-11-30,因此是YMD,第二组是2016-01-12,因此是YDM。 Why are the date formats different? 为什么日期格式不同? How would I keep them both as YMD? 我如何将它们都保留为YMD?

这有效:

df.columns = pd.MultiIndex.from_arrays([df.columns, pd.to_datetime(df.iloc[0], format='%d/%m/%Y')])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM