描述时间序列熊猫的差距

Question

I'm trying to write a function that takes a continuous time series and returns a data structure which describes any missing gaps in the data (eg a DF with columns 'start' and 'end'). 我正在尝试编写一个连续时间序列的函数，并返回描述数据中任何缺失空白的数据结构（例如，带有'start'和'end'列的DF）。 It seems like a fairly common issue for time series, but despite messing around with groupby, diff, and the like -- and exploring SO -- I haven't been able to come up with much better than the below. 对于时间序列来说，这似乎是一个相当普遍的问题，但是尽管乱搞了groupby，diff等等 - 并且正在探索SO - 我还没有能够提出比下面更好的方法。

It's a priority for me that this use vectorized operations to remain efficient. 对我来说，这是一个优先考虑，它使用矢量化操作来保持高效。 There has got to be a more obvious solution using vectorized operations -- hasn't there? 使用矢量化操作必须有一个更明显的解决方案 - 不是吗？ Thanks for any help, folks. 伙计们，感谢您的帮助。

import pandas as pd


def get_gaps(series):
    """
    @param series: a continuous time series of data with the index's freq set
    @return: a series where the index is the start of gaps, and the values are
         the ends
    """
    missing = series.isnull()
    different_from_last = missing.diff()

    # any row not missing while the last was is a gap end        
    gap_ends = series[~missing & different_from_last].index

    # count the start as different from the last
    different_from_last[0] = True

    # any row missing while the last wasn't is a gap start
    gap_starts = series[missing & different_from_last].index        

    # check and remedy if series ends with missing data
    if len(gap_starts) > len(gap_ends):
         gap_ends = gap_ends.append(series.index[-1:] + series.index.freq)

    return pd.Series(index=gap_starts, data=gap_ends)

For the record, Pandas==0.13.1, Numpy==1.8.1, Python 2.7 为了记录，Pandas == 0.13.1，Numpy == 1.8.1，Python 2.7

Answer 1

This problem can be transformed to find the continuous numbers in a list. 可以转换此问题以在列表中查找连续数字。 find all the indices where the series is null, and if a run of (3,4,5,6) are all null, you only need to extract the start and end (3,6) 找到系列为null的所有索引，如果（3,4,5,6）的运行都为null，则只需要提取开始和结束（3,6）

import numpy as np
import pandas as pd
from operator import itemgetter
from itertools import groupby


# create an example 
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
s = pd.series( data, index=data)
s = s.reindex(xrange(18))
print find_gap(s)  


def find_gap(s): 
    """ just treat it as a list
    """ 
    nullindex = np.where( s.isnull())[0]
    ranges = []
    for k, g in groupby(enumerate(nullindex), lambda (i,x):i-x):
        group = map(itemgetter(1), g)
        ranges.append((group[0], group[-1]))
    startgap, endgap = zip(* ranges) 
    return pd.series( endgap, index= startgap )

reference : Identify groups of continuous numbers in a list reference：识别列表中的连续数字组

描述时间序列熊猫的差距

问题描述

1 个解决方案

解决方案1
2 2014-07-18 13:12:33

描述时间序列熊猫的差距

问题描述

1 个解决方案

解决方案1 2 2014-07-18 13:12:33

解决方案1
2 2014-07-18 13:12:33