[英]shifting timezone for reshaped pandas dataframe
I am using Pandas dataframes with DatetimeIndex
to manipulate timeseries data. 我正在使用带有
DatetimeIndex
Pandas数据帧来处理时间序列数据。 The data is stored at UTC
time and I usually keep it that way (with naive DatetimeIndex
), and only use timezones for output. 数据存储在
UTC
时间,我通常保持这种方式(使用天真的DatetimeIndex
),并且只使用时区进行输出。 I like it that way because nothing in the world confuses me more than trying to manipuluate timezones. 我喜欢这样,因为世界上没有什么比试图操纵时区更让我困惑。
eg 例如
In: ts = pd.date_range('2017-01-01 00:00','2017-12-31 23:30',freq='30Min')
data = np.random.rand(17520,1)
df= pd.DataFrame(data,index=ts,columns = ['data'])
df.head()
Out[15]:
data
2017-01-01 00:00:00 0.697478
2017-01-01 00:30:00 0.506914
2017-01-01 01:00:00 0.792484
2017-01-01 01:30:00 0.043271
2017-01-01 02:00:00 0.558461
I want to plot a chart of data versus time for each day of the year so I reshape the dataframe to have time along the index and dates for columns 我想绘制一年中每一天的数据与时间的关系图,因此我重新设计数据框,以便在索引和列的日期之间留出时间
df.index = [df.index.time,df.index.date]
df_new = df['data'].unstack()
In: df_new.head()
Out :
2017-01-01 2017-01-02 2017-01-03 2017-01-04 2017-01-05 \
00:00:00 0.697478 0.143626 0.189567 0.061872 0.748223
00:30:00 0.506914 0.470634 0.430101 0.551144 0.081071
01:00:00 0.792484 0.045259 0.748604 0.305681 0.333207
01:30:00 0.043271 0.276888 0.034643 0.413243 0.921668
02:00:00 0.558461 0.723032 0.293308 0.597601 0.120549
If I'm not worried about timezones i can plot like this: 如果我不担心时区,我可以这样画:
fig, ax = plt.subplots()
ax.plot(df_new.index,df_new)
but I want to plot the data in the local timezone ( tz = pytz.timezone('Australia/Sydney'
) making allowance for daylight savings time, but the times and dates are no longer Timestamp
objects so I can't use Pandas timezone handling. Or can I? 但我想绘制当地时区的数据(
tz = pytz.timezone('Australia/Sydney'
),考虑夏令时,但时间和日期不再是Timestamp
对象,所以我不能使用Pandas时区处理或者我可以吗?
Assuming I can't, I'm trying to do the shift manually, (given DST starts 1/10 at 2am and finishes 1/4 at 2am), so I've got this far: 假设我不能,我正在尝试手动换班(给定DST从凌晨2点开始1/10,凌晨2点结束1/4),所以我已经做到了这一点:
df_new[[c for c in df_new.columns if c >= dt.datetime(2017,4,1) and c <dt.datetime(2017,10,1)]].shift_by(+10)
df_new[[c for c in df_new.columns if c < dt.datetime(2017,4,1) or c >= dt.datetime(2017,10,1)]].shift_by(+11)
but am not sure how to write the function shift_by
. 但我不确定如何写函数
shift_by
。 (This doesn't handle midnight to 2am on teh changeover days correctly, which is not ideal, but I could live with) (这在正确的转换日期间没有处理午夜到凌晨2点,这不是理想的,但我可以忍受)
Use dt.tz_localize
+ dt.tz_convert
to convert the dataframe dates to a particular timezone. 使用
dt.tz_localize
+ dt.tz_convert
将数据帧日期转换为特定时区。
df.index = df.index.tz_localize('UTC').tz_convert('Australia/Sydney')
df.index = [df.index.time, df.index.date]
Be a little careful when creating the MuliIndex
- as you observed, it creates two rows of duplicate timestamps, so if that's the case, get rid of it with duplicated
: 在创建
MuliIndex
时要小心 - 正如您所观察到的,它会创建两行重复的时间戳,因此如果是这种情况,请使用duplicated
它:
df = df[~df.index.duplicated()]
df = df['data'].unstack()
You can also create subplots with df.plot
: 您还可以使用
df.plot
创建子图:
df.plot(subplots=True)
plt.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.