I have the following data.
{5072: Timedelta('0 days 00:00:00'), 5085: Timedelta('0 days 00:00:00'), 5107: Timedelta('0 days 00:00:00'), 5126: Timedelta('1 days 00:00:00'), 5169: Timedelta('1 days 00:00:00'), 5211: Timedelta('2 days 00:00:00'), 5222: Timedelta('3 days 00:00:00'), 5247: Timedelta('3 days 00:00:00'), 5287: Timedelta('18 days 00:00:00'), 5310: Timedelta('21 days 00:00:00'), 5333: Timedelta('22 days 00:00:00'), 5381: Timedelta('23 days 00:00:00'), 5419: Timedelta('24 days 00:00:00')}
timeDiff
5072 0 days
5085 0 days
5107 0 days
5126 1 days
5169 1 days
5211 2 days
5222 3 days
5247 3 days
5287 18 days
5310 21 days
5333 22 days
5381 23 days
5419 24 days
The series is of type timedelta64[ns]
.
For a series like this, i want to be able to get the first set of consecutive numbers including repeated numbers. So in this case, it will return the indices. (Note that i need the indices because I need them to slice through a pandas dataframe later on.)
I want the function to return the indices of [0 days,0 days,0 days,1 days, 1 days, 2 days, 3 days, 3 days]
Would such a function help in your case? it receives a list and find the consecutive sublist from a given start index.
def get_consecutive_indexs(lst, start_from=0):
num = lst[start_from]
index = start_from
while index < len(lst) and (lst[index] - num in (0, 1)):
num = lst[index]
index += 1
return lst[start_from:index]
if your days values is a string of the structure "X days". you can:
get_consecutive_indexs([int(day.split(' ')[0]) for day in days])
Here's a function that takes a Series
containing Timedeltas
from the datetime
package. I provided some tests with normal indexes but it will also work with a non-range indexes.
import pandas as pd
import datetime as dt
def getConsecutiveIndexes(series, minimumLength=2, maximumDelta=1):
"""
Return indexes based on Timedeltas's days value consecutively being within a threshold and of a minimum length.
:param pd.Series series: Sorted series containing Timedeltas.
:param minimumLength: Minimum length of list to be returned.
:param maximumDelta: Maximum difference to be considered consecutive.
"""
indexes = []
previousValue = None
for i, timedelta in series.items():
val = timedelta.days
previousValueDelta = 0 if previousValue is None else val - previousValue
isConsecutive = previousValueDelta <= maximumDelta
lastIndex = i == len(series) - 1
if isConsecutive or not indexes:
indexes.append(i)
satisfiedLength = len(indexes) >= minimumLength
if (lastIndex or not isConsecutive) and satisfiedLength:
return indexes
if not isConsecutive:
indexes = [i]
previousValue = val
def createTimedeltaSeries(l):
"""Easily create a series containing Timedeltas."""
return pd.Series([dt.timedelta(days=value) for value in l])
assert [0, 1, 2, 3, 4, 5, 6, 7] == getConsecutiveIndexes(createTimedeltaSeries([0, 0, 0, 1, 1, 2, 3, 3, 7, 9]))
assert [1, 2] == getConsecutiveIndexes(createTimedeltaSeries([2, 4, 5, 7, 10]))
assert [0, 1, 2, 3] == getConsecutiveIndexes(createTimedeltaSeries([2, 4, 5, 7, 10]), maximumDelta=2)
assert [2, 3, 4] == getConsecutiveIndexes(createTimedeltaSeries([1, 2, 4, 5, 6]), minimumLength=3)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.