[英]Python: finding the first set of consecutive numbers (including repeated numbers)
我有以下數據。
{5072: Timedelta('0 days 00:00:00'), 5085: Timedelta('0 days 00:00:00'), 5107: Timedelta('0 days 00:00:00'), 5126: Timedelta('1 days 00:00:00'), 5169: Timedelta('1 days 00:00:00'), 5211: Timedelta('2 days 00:00:00'), 5222: Timedelta('3 days 00:00:00'), 5247: Timedelta('3 days 00:00:00'), 5287: Timedelta('18 days 00:00:00'), 5310: Timedelta('21 days 00:00:00'), 5333: Timedelta('22 days 00:00:00'), 5381: Timedelta('23 days 00:00:00'), 5419: Timedelta('24 days 00:00:00')}
timeDiff
5072 0 days
5085 0 days
5107 0 days
5126 1 days
5169 1 days
5211 2 days
5222 3 days
5247 3 days
5287 18 days
5310 21 days
5333 22 days
5381 23 days
5419 24 days
該系列的類型為timedelta64[ns]
。
對於這樣的系列,我希望能夠獲得第一組連續數字,包括重復數字。 所以在這種情況下,它將返回索引。 (請注意,我需要索引,因為稍后我需要它們通過 pandas dataframe 切片。)
我希望 function 返回[0 days,0 days,0 days,1 days, 1 days, 2 days, 3 days, 3 days]
這樣的 function 對您的情況有幫助嗎? 它接收一個列表並從給定的起始索引中找到連續的子列表。
def get_consecutive_indexs(lst, start_from=0):
num = lst[start_from]
index = start_from
while index < len(lst) and (lst[index] - num in (0, 1)):
num = lst[index]
index += 1
return lst[start_from:index]
如果您的天數是結構“X 天”的字符串。 你可以:
get_consecutive_indexs([int(day.split(' ')[0]) for day in days])
這是一個 function,它采用包含來自datetime
時間Timedeltas
的 Timedeltas 的Series
。 我提供了一些帶有正常索引的測試,但它也適用於非范圍索引。
import pandas as pd
import datetime as dt
def getConsecutiveIndexes(series, minimumLength=2, maximumDelta=1):
"""
Return indexes based on Timedeltas's days value consecutively being within a threshold and of a minimum length.
:param pd.Series series: Sorted series containing Timedeltas.
:param minimumLength: Minimum length of list to be returned.
:param maximumDelta: Maximum difference to be considered consecutive.
"""
indexes = []
previousValue = None
for i, timedelta in series.items():
val = timedelta.days
previousValueDelta = 0 if previousValue is None else val - previousValue
isConsecutive = previousValueDelta <= maximumDelta
lastIndex = i == len(series) - 1
if isConsecutive or not indexes:
indexes.append(i)
satisfiedLength = len(indexes) >= minimumLength
if (lastIndex or not isConsecutive) and satisfiedLength:
return indexes
if not isConsecutive:
indexes = [i]
previousValue = val
def createTimedeltaSeries(l):
"""Easily create a series containing Timedeltas."""
return pd.Series([dt.timedelta(days=value) for value in l])
assert [0, 1, 2, 3, 4, 5, 6, 7] == getConsecutiveIndexes(createTimedeltaSeries([0, 0, 0, 1, 1, 2, 3, 3, 7, 9]))
assert [1, 2] == getConsecutiveIndexes(createTimedeltaSeries([2, 4, 5, 7, 10]))
assert [0, 1, 2, 3] == getConsecutiveIndexes(createTimedeltaSeries([2, 4, 5, 7, 10]), maximumDelta=2)
assert [2, 3, 4] == getConsecutiveIndexes(createTimedeltaSeries([1, 2, 4, 5, 6]), minimumLength=3)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.