在 Pandas 中移动缺少日期的时间序列

Question

I have a times series with some missing entries, that looks like this:我有一个时间序列，其中缺少一些条目，如下所示：

date     value
---------------
2000       5
2001      10
2003      8
2004      72
2005      12
2007      13

I would like to do create a column for the "previous_value".我想为“previous_value”创建一个列。 But I only want it to show values for consecutive years.但我只希望它显示连续几年的值。 So I want it to look like this:所以我希望它看起来像这样：

date     value    previous_value
-------------------------------
2000       5        nan
2001      10         5
2003      8         nan
2004      72         8
2005      12        72
2007      13        nan

However just applying pandas shift function directly to the column 'value' would give 'previous_value' = 10 for 'time' = 2003, and 'previous_value' = 12 for 'time' = 2007.但是，仅将 Pandas shift 函数直接应用于列 'value' 会为 'time' = 2003 提供 'previous_value' = 10，而对于 'time' = 2007 则为 'previous_value' = 12。

What's the most elegant way to deal with this in pandas?在熊猫中处理这个问题的最优雅的方法是什么？ (I'm not sure if it's as easy as setting the 'freq' attribute). （我不确定它是否像设置 'freq' 属性一样简单）。

Answer 1

In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007],
                              'value':[5,10,8,72,12,13] })

In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ]

In [590]: df
Out[590]: 
   date  value  previous_value
0  2000      5             NaN
1  2001     10               5
2  2003      8             NaN
3  2004     72               8
4  2005     12              72
5  2007     13             NaN

Also see here for a time series approach using resample() : Using shift() with unevenly spaced data另请参阅此处使用resample()的时间序列方法：使用具有不均匀间隔数据的 shift()

Answer 2

Your example doesn't look like real time series data with timestamps.您的示例看起来不像带有时间戳的实时序列数据。 Let's take another example with the missing date 2020-01-03 :让我们再举一个缺少日期2020-01-03 ：

df = pd.DataFrame({"val": [10, 20, 30, 40, 50]},
                  index=pd.date_range("2020-01-01", "2020-01-05"))
df.drop(pd.Timestamp('2020-01-03'), inplace=True)

            val
2020-01-01   10
2020-01-02   20
2020-01-04   40
2020-01-05   50

To shift by one day you can set the freq parameter to 'D':要移动一天，您可以将freq参数设置为“D”：

df.shift(1, freq='D')

Output:输出：

            val
2020-01-02   10
2020-01-03   20
2020-01-05   40
2020-01-06   50

To combine original data with the shifted one you can merge both tables:要将原始数据与移位数据合并，您可以合并两个表：

df.merge(df.shift(1, freq='D'),
         left_index=True,
         right_index=True,
         how='left',
         suffixes=('', '_previous'))

Output:输出：

            val  val_previous
2020-01-01   10           NaN
2020-01-02   20          10.0
2020-01-04   40           NaN
2020-01-05   50          40.0

Other offset aliases you can find here您可以在此处找到其他偏移别名

在 Pandas 中移动缺少日期的时间序列

问题描述

2 个解决方案

解决方案1
8 已采纳 2015-03-11 21:33:45

解决方案2
1 2021-07-18 19:33:26

在 Pandas 中移动缺少日期的时间序列

问题描述

2 个解决方案

解决方案1 8 已采纳 2015-03-11 21:33:45

解决方案2 1 2021-07-18 19:33:26

解决方案1
8 已采纳 2015-03-11 21:33:45

解决方案2
1 2021-07-18 19:33:26