简体   繁体   English

Python:按日期时间索引熊猫系列

[英]Python: indexing pandas series by datetime

I am new to python and have the following problem: 我是python新手,有以下问题:

I have daily measurements that I imported into a pd.series object. 我有导入到pd.series对象的每日测量值。 The problem is that the data is such that there are always 31 measurements per month, irrespective of whether it is January or February. 问题在于,数据是这样的,不管是一月还是二月,每月总是有31次测量。 In case the respective month has less than 31 days, the measurements are set to zero for those days that exceed the last day of that month. 如果相应月份少于31天,则将超出该月最后一天的那些天的度量设置为零。 However, missing data within a month is also set to zero. 但是,一个月内丢失的数据也设置为零。 The data for the months of April and May looks like this. 4月和5月月份的数据如下所示。

1990-04-01    25.870
1990-04-01    26.205
1990-04-01    12.283
1990-04-01    19.630
1990-04-01    19.239
1990-04-01    23.614
1990-04-01    40.891
1990-04-01    41.152
1990-04-01    35.935
1990-04-01    25.682
1990-04-01    21.674
1990-04-01    15.818
1990-04-01    11.413
1990-04-01    16.522
1990-04-01    33.543
1990-04-01    28.727
1990-04-01    18.043
1990-04-01    10.326
1990-04-01    19.159
1990-04-01    21.848
1990-04-01    35.250
1990-04-01    39.152
1990-04-01    31.522
1990-04-01    23.152
1990-04-01    13.250
1990-04-01    20.705
1990-04-01    27.304
1990-04-01    24.478
1990-04-01    33.674
1990-04-01    32.591
1990-04-01     0.000
1990-05-01    40.370
1990-05-01    41.609
1990-05-01    47.478
1990-05-01    40.682
1990-05-01    42.587
1990-05-01    38.826
1990-05-01    35.543
1990-05-01    30.955
1990-05-01    23.543
1990-05-01     7.857
1990-05-01     0.000
1990-05-01     0.000
1990-05-01     0.000
1990-05-01     0.000
1990-05-01     0.000
1990-05-01     0.000
1990-05-01    54.133
1990-05-01    41.114
1990-05-01    44.739
1990-05-01    43.848
1990-05-01    26.739
1990-05-01    21.318
1990-05-01    26.750
1990-05-01    54.864
1990-05-01    33.000
1990-05-01    33.304
1990-05-01    34.304
1990-05-01    20.886
1990-05-01    20.250
1990-05-01    24.804
1990-05-01    28.091
Length: 62

Is there a way to drop those entries that do not belong to the respective months and provide a new time-index for each day? 是否可以删除不属于相应月份的条目并提供每天的新时间索引? I need to keep the zeros within a month. 我需要在一个月内保持零。

First I would replace the 0s with NaN (to symbolise missing data): 首先,我将NaS替换为0(以表示丢失的数据):

s.replace(0, np.nan, inplace=True)

One way is to create a function which acts on each group (month): 一种方法是创建一个对每个组(月)起作用的函数:

def f(s_month):
    date = s_month.index[0]
    # create the month long range
    rng = pd.date_range(date, date + pd.offsets.MonthEnd(1), freq='D')
    # throw away results longer than month length
    s_month = s_month.iloc[0:len(rng)]
    # reindex those remaining
    s_month.index = rng
    return s_month

Note: this requires that you have a DatetimeIndex, ie s.index = pd.to_datetime(s.index) . 注意:这要求您具有DatetimeIndex,即s.index = pd.to_datetime(s.index)

In [11]: s_month = s.loc['1990-04-01']

In [12]: f(s_month)
Out[12]: 
1990-04-01    25.870
1990-04-02    26.205
1990-04-03    12.283
1990-04-04    19.630
...
1990-04-28    24.478
1990-04-29    33.674
1990-04-30    32.591
Freq: D, Name: Value, dtype: float64

Use this with a groupby apply: 将其与groupby一起使用:

In [13]: res = s.groupby(s.index).apply(f)

In [14]: res
Out[14]: 
1990-04-01  1990-04-01    25.870
            1990-04-02    26.205
            1990-04-03    12.283
            1990-04-04    19.630
...

Correct the MultiIndex: 更正MultiIndex:

In [15]: res.index = res.index.droplevel(0)

In [16]: res
Out[16]: 
1990-04-01    25.870
1990-04-02    26.205
1990-04-03    12.283
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM