I have a large list of continuous data, and I'm trying to figure out where the data increases for a minimum amount of entries and where it decreases. For example, if I have list
[0, 1, 3, 8, 10, 13, 13, 8, 4, 11, 5, 1, 0]
I want to be able to capture the runs of 0, 1, 3, 8, 10, 13, 13 and 11, 5, 1, 0 but not the run of 8, 4 (because it's less than the arbitrary amount 3).
Currently I'm using ascending and descending functions to capture a certain number of the runs at a time (0, 1, 3 and 1, 3, 8, for example), but it doesn't get the entire length in a single list.
Any ideas on how to solve this problem?
Monotonic without overlaps:
This version finds monotonic sequences and doesn't register overlaps; sorry for not having paid attention initially.
def find_sequences(lst, min_len=3):
curr = []
asc = None
for i in lst:
if not curr or len(curr) == 1 or asc and i >= curr[-1] or not asc and i <= curr[-1]:
if len(curr) == 1:
asc = curr[-1] < i
curr.append(i)
else:
if len(curr) >= min_len:
yield curr
asc = None
curr = [i]
if len(curr) >= min_len:
yield curr
yields:
[[0, 1, 3, 8, 10, 13, 13], [11, 5, 1, 0]]
with performance:
In [6]: timeit list(find_sequences(x))
100000 loops, best of 3: 8.44 µs per loop
Monotonic/non-monotonic with overlaps:
This function finds monotonic & overlapping sequences; you can easily change it to work non-monotonically by changing >=
and <=
to >
and <
respectively, or even make it parametrizable.
def find_sequences(lst, min_len=3):
asc, desc = [], []
for i in lst:
if not asc or i >= asc[-1]:
asc.append(i)
else:
if len(asc) >= min_len:
yield asc
asc = [i]
if not desc or i <= desc[-1]:
desc.append(i)
else:
if len(desc) >= min_len:
yield desc
desc = [i]
if len(desc) >= min_len:
yield desc
if len(asc) >= min_len:
yield asc
yields:
[[0, 1, 3, 8, 10, 13, 13], [13, 13, 8, 4], [11, 5, 1, 0]]
with performance:
In [3]: timeit list(find_sequences(x))
100000 loops, best of 3: 10.5 µs per loop
The following should work... it breaks the data into disjoint monotonic subsequences and then filters by your length criteria.
def get_monotonic_subsequences(data, min_length):
direction = data[1] - data[0] #determine direction of initial subsequence
subsequences = []
cur_seq = []
for i in range(0, len(data) - 1):
if direction > 0:
if (data[i] >= data[i-1]):
cur_seq.append(data[i])
else:
subsequences.append(cur_seq)
cur_seq = [data[i]]
direction = data[i+1] - data[i]
else:
if (data[i] <= data[i-1]):
cur_seq.append(data[i])
else:
subsequences.append(cur_seq)
cur_seq = [data[i]]
direction = data[i+1] - data[i]
if (data[-1] - data[-2])*direction > 0:
cur_seq.append(data[-1])
subsequences.append(cur_seq)
else:
subsequences.append(cur_seq)
subsequences.append([data[-1]])
return [x for x in subsequences if len(x) >= min_length]
As an aside, it's not clear from your question, but your output suggests that you expect the subsequences to be collected greedily from left to right, which this code assumes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.