简体   繁体   中英

Python: finding the first set of consecutive numbers (including repeated numbers)

I have the following data.

 {5072: Timedelta('0 days 00:00:00'), 5085: Timedelta('0 days 00:00:00'), 5107: Timedelta('0 days 00:00:00'), 5126: Timedelta('1 days 00:00:00'), 5169: Timedelta('1 days 00:00:00'), 5211: Timedelta('2 days 00:00:00'), 5222: Timedelta('3 days 00:00:00'), 5247: Timedelta('3 days 00:00:00'), 5287: Timedelta('18 days 00:00:00'), 5310: Timedelta('21 days 00:00:00'), 5333: Timedelta('22 days 00:00:00'), 5381: Timedelta('23 days 00:00:00'), 5419: Timedelta('24 days 00:00:00')}

                timeDiff
5072            0 days
5085            0 days
5107            0 days
5126            1 days
5169            1 days
5211            2 days
5222            3 days
5247            3 days
5287           18 days
5310           21 days
5333           22 days
5381           23 days
5419           24 days

The series is of type timedelta64[ns] .

For a series like this, i want to be able to get the first set of consecutive numbers including repeated numbers. So in this case, it will return the indices. (Note that i need the indices because I need them to slice through a pandas dataframe later on.)

I want the function to return the indices of [0 days,0 days,0 days,1 days, 1 days, 2 days, 3 days, 3 days]

Would such a function help in your case? it receives a list and find the consecutive sublist from a given start index.

def get_consecutive_indexs(lst, start_from=0):
    num = lst[start_from]
    index = start_from

    while index < len(lst) and (lst[index] - num in (0, 1)):
        num = lst[index]
        index += 1
    return lst[start_from:index]

if your days values is a string of the structure "X days". you can:

get_consecutive_indexs([int(day.split(' ')[0]) for day in days])

Here's a function that takes a Series containing Timedeltas from the datetime package. I provided some tests with normal indexes but it will also work with a non-range indexes.

import pandas as pd
import datetime as dt

def getConsecutiveIndexes(series, minimumLength=2, maximumDelta=1):
    """
    Return indexes based on Timedeltas's days value consecutively being within a threshold and of a minimum length.

    :param pd.Series series: Sorted series containing Timedeltas.
    :param minimumLength: Minimum length of list to be returned.
    :param maximumDelta: Maximum difference to be considered consecutive.
    """
    indexes = []
    previousValue = None
    for i, timedelta in series.items():
        val = timedelta.days
        previousValueDelta = 0 if previousValue is None else val - previousValue
        isConsecutive = previousValueDelta <= maximumDelta
        lastIndex = i == len(series) - 1
        
        if isConsecutive or not indexes:
            indexes.append(i)

        satisfiedLength = len(indexes) >= minimumLength
        
        if (lastIndex or not isConsecutive) and satisfiedLength:
            return indexes

        if not isConsecutive:
            indexes = [i]

        previousValue = val

def createTimedeltaSeries(l):
    """Easily create a series containing Timedeltas."""
    return pd.Series([dt.timedelta(days=value) for value in l])

assert [0, 1, 2, 3, 4, 5, 6, 7] == getConsecutiveIndexes(createTimedeltaSeries([0, 0, 0, 1, 1, 2, 3, 3, 7, 9]))
assert [1, 2]                   == getConsecutiveIndexes(createTimedeltaSeries([2, 4, 5, 7, 10]))
assert [0, 1, 2, 3]             == getConsecutiveIndexes(createTimedeltaSeries([2, 4, 5, 7, 10]),   maximumDelta=2)
assert [2, 3, 4]                == getConsecutiveIndexes(createTimedeltaSeries([1, 2, 4, 5, 6]),    minimumLength=3)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM