简体   繁体   English

熊猫数据框日期时间索引上的25-23小时

[英]25-23 hour days on pandas dataframe datetime index

I have a pandas dataframe indexed by a datetimeindex. 我有一个由datetimeindex索引的熊猫数据框。 The frequency of the index is variable, but mostly is on a minute-based sampling. 索引的频率是可变的,但主要是基于分钟的采样。

Due to a database problem, dayligth saving time is not properly adressed on the indexing. 由于数据库问题,在索引上没有适当地节省日间节省时间。 So, on particular months/days I have duplicated values for the index. 因此,在特定的月份/日期,我有重复的索引值。 Is there a way (without using timezones) to handle 23-25 hour days on pandas so I can keep linear track of time over records? 有没有一种方法(不使用时区)处理熊猫的23-25小时工作日,以便我可以对记录进行线性跟踪?

Here is a small example of my problem: 这是我的问题的一个小例子:

DatetimeIndex(['2014-03-12 22:59:59', '2014-03-12 22:59:59',
           '2014-03-12 23:00:59', '2014-03-12 23:00:59',
           '2014-03-12 23:01:59', '2014-03-12 23:02:59',
           '2014-03-12 23:02:59', '2014-03-12 23:03:59',
           '2014-03-12 23:03:59', '2014-03-12 23:04:59',
           '2014-03-12 23:04:59', '2014-03-12 23:05:59',
           '2014-03-12 23:06:59', '2014-03-12 23:06:59',
           '2014-03-12 23:07:59', '2014-03-12 23:07:59',
           '2014-03-12 23:08:59', '2014-03-12 23:09:59',
           '2014-03-12 23:09:59', '2014-03-12 23:10:59',
           '2014-03-12 23:10:59', '2014-03-12 23:11:59',
           '2014-03-12 23:11:59', '2014-03-12 23:12:59',
           '2014-03-12 23:13:59', '2014-03-12 23:13:59',
           '2014-03-12 23:14:59', '2014-03-12 23:14:59',
           '2014-03-12 23:15:59', '2014-03-12 23:16:59',
           '2014-03-12 23:16:59', '2014-03-12 23:17:59',
           '2014-03-12 23:17:59', '2014-03-12 23:18:59',
           '2014-03-12 23:19:59', '2014-03-12 23:19:59',
           '2014-03-12 23:20:59', '2014-03-12 23:20:59',
           '2014-03-12 23:21:59', '2014-03-12 23:22:59',
           '2014-03-12 23:22:59', '2014-03-12 23:23:59',
           '2014-03-12 23:24:59', '2014-03-12 23:24:59',
           '2014-03-12 23:25:59', '2014-03-12 23:26:59',
           '2014-03-12 23:26:59', '2014-03-12 23:27:59',
           '2014-03-12 23:27:59', '2014-03-12 23:28:59',
           '2014-03-12 23:28:59', '2014-03-12 23:29:59',
           '2014-03-12 23:30:59', '2014-03-12 23:30:59',
           '2014-03-12 23:31:59', '2014-03-12 23:31:59',
           '2014-03-12 23:32:59', '2014-03-12 23:33:59',
           '2014-03-12 23:33:59', '2014-03-12 23:34:59',
           '2014-03-12 23:34:59', '2014-03-12 23:35:59',
           '2014-03-12 23:36:59', '2014-03-12 23:36:59',
           '2014-03-12 23:37:59', '2014-03-12 23:38:59',
           '2014-03-12 23:38:59', '2014-03-12 23:39:59',
           '2014-03-12 23:40:59', '2014-03-12 23:40:59',
           '2014-03-12 23:41:59', '2014-03-12 23:42:59',
           '2014-03-12 23:42:59', '2014-03-12 23:43:59',
           '2014-03-12 23:44:59', '2014-03-12 23:44:59',
           '2014-03-12 23:45:59', '2014-03-12 23:46:59',
           '2014-03-12 23:46:59', '2014-03-12 23:47:59',
           '2014-03-12 23:48:59', '2014-03-12 23:48:59',
           '2014-03-12 23:49:59', '2014-03-12 23:49:59',
           '2014-03-12 23:50:59', '2014-03-12 23:51:59',
           '2014-03-12 23:51:59', '2014-03-12 23:52:59',
           '2014-03-12 23:52:59', '2014-03-12 23:54:59',
           '2014-03-12 23:56:59', '2014-03-12 23:58:59',
           '2014-03-12 23:54:00', '2014-03-12 23:55:59',
           '2014-03-12 23:56:59', '2014-03-12 23:57:59',
           '2014-03-12 23:59:59'],
          dtype='datetime64[ns]', name='Timestamp', freq=None)  

Your problem is that date index are not mutable so you can't have inplace operations modifying them, you'll have to write over it. 您的问题是日期索引是不可变的,因此您不能进行就地操作来修改它们,而必须重写它。

One solution could be to "unroll" the index to still have the same number of time steps but every other timestamp would be pushed forward/backward an hour. 一种解决方案可能是“展开”索引,使其仍然具有相同数量的时间步长,但每隔一个时间戳就会向前/向后推一个小时。

I refer to your index in the OP as index : 我在OP中将您的索引称为index

import pandas as pd
df = pd.DataFrame(index=index)

first_step = df.index[::2] # every second index

## shift everyone forward starting from the second value, grab every second value ##

second_step = df.index[1::2].shift(periods=1,freq='1H')

new_index = first_step.append(second_step)

df.index = new_index

I can't help but feel that it's weird, tell me if that helps. 我不禁觉得这很奇怪,请告诉我是否有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM