簡體   English   中英

Python:找到第一組連續數字(包括重復數字)

[英]Python: finding the first set of consecutive numbers (including repeated numbers)

我有以下數據。

 {5072: Timedelta('0 days 00:00:00'), 5085: Timedelta('0 days 00:00:00'), 5107: Timedelta('0 days 00:00:00'), 5126: Timedelta('1 days 00:00:00'), 5169: Timedelta('1 days 00:00:00'), 5211: Timedelta('2 days 00:00:00'), 5222: Timedelta('3 days 00:00:00'), 5247: Timedelta('3 days 00:00:00'), 5287: Timedelta('18 days 00:00:00'), 5310: Timedelta('21 days 00:00:00'), 5333: Timedelta('22 days 00:00:00'), 5381: Timedelta('23 days 00:00:00'), 5419: Timedelta('24 days 00:00:00')}

                timeDiff
5072            0 days
5085            0 days
5107            0 days
5126            1 days
5169            1 days
5211            2 days
5222            3 days
5247            3 days
5287           18 days
5310           21 days
5333           22 days
5381           23 days
5419           24 days

該系列的類型為timedelta64[ns]

對於這樣的系列,我希望能夠獲得第一組連續數字,包括重復數字。 所以在這種情況下,它將返回索引。 (請注意,我需要索引,因為稍后我需要它們通過 pandas dataframe 切片。)

我希望 function 返回[0 days,0 days,0 days,1 days, 1 days, 2 days, 3 days, 3 days]

這樣的 function 對您的情況有幫助嗎? 它接收一個列表並從給定的起始索引中找到連續的子列表。

def get_consecutive_indexs(lst, start_from=0):
    num = lst[start_from]
    index = start_from

    while index < len(lst) and (lst[index] - num in (0, 1)):
        num = lst[index]
        index += 1
    return lst[start_from:index]

如果您的天數是結構“X 天”的字符串。 你可以:

get_consecutive_indexs([int(day.split(' ')[0]) for day in days])

這是一個 function,它采用包含來自datetime時間Timedeltas的 Timedeltas 的Series 我提供了一些帶有正常索引的測試,但它也適用於非范圍索引。

import pandas as pd
import datetime as dt

def getConsecutiveIndexes(series, minimumLength=2, maximumDelta=1):
    """
    Return indexes based on Timedeltas's days value consecutively being within a threshold and of a minimum length.

    :param pd.Series series: Sorted series containing Timedeltas.
    :param minimumLength: Minimum length of list to be returned.
    :param maximumDelta: Maximum difference to be considered consecutive.
    """
    indexes = []
    previousValue = None
    for i, timedelta in series.items():
        val = timedelta.days
        previousValueDelta = 0 if previousValue is None else val - previousValue
        isConsecutive = previousValueDelta <= maximumDelta
        lastIndex = i == len(series) - 1
        
        if isConsecutive or not indexes:
            indexes.append(i)

        satisfiedLength = len(indexes) >= minimumLength
        
        if (lastIndex or not isConsecutive) and satisfiedLength:
            return indexes

        if not isConsecutive:
            indexes = [i]

        previousValue = val

def createTimedeltaSeries(l):
    """Easily create a series containing Timedeltas."""
    return pd.Series([dt.timedelta(days=value) for value in l])

assert [0, 1, 2, 3, 4, 5, 6, 7] == getConsecutiveIndexes(createTimedeltaSeries([0, 0, 0, 1, 1, 2, 3, 3, 7, 9]))
assert [1, 2]                   == getConsecutiveIndexes(createTimedeltaSeries([2, 4, 5, 7, 10]))
assert [0, 1, 2, 3]             == getConsecutiveIndexes(createTimedeltaSeries([2, 4, 5, 7, 10]),   maximumDelta=2)
assert [2, 3, 4]                == getConsecutiveIndexes(createTimedeltaSeries([1, 2, 4, 5, 6]),    minimumLength=3)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM