简体   繁体   中英

Determining where in a list ascending or descending stops

I have a large list of continuous data, and I'm trying to figure out where the data increases for a minimum amount of entries and where it decreases. For example, if I have list

[0, 1, 3, 8, 10, 13, 13, 8, 4, 11, 5, 1, 0]

I want to be able to capture the runs of 0, 1, 3, 8, 10, 13, 13 and 11, 5, 1, 0 but not the run of 8, 4 (because it's less than the arbitrary amount 3).

Currently I'm using ascending and descending functions to capture a certain number of the runs at a time (0, 1, 3 and 1, 3, 8, for example), but it doesn't get the entire length in a single list.

Any ideas on how to solve this problem?

Monotonic without overlaps:

This version finds monotonic sequences and doesn't register overlaps; sorry for not having paid attention initially.

def find_sequences(lst, min_len=3):
    curr = []
    asc = None
    for i in lst:
        if not curr or len(curr) == 1 or asc and i >= curr[-1] or not asc and i <= curr[-1]:
            if len(curr) == 1:
                asc = curr[-1] < i
            curr.append(i)
        else:
            if len(curr) >= min_len:
                yield curr
            asc = None
            curr = [i]
    if len(curr) >= min_len:
        yield curr

yields:

[[0, 1, 3, 8, 10, 13, 13], [11, 5, 1, 0]]

with performance:

In [6]: timeit list(find_sequences(x))
100000 loops, best of 3: 8.44 µs per loop

Monotonic/non-monotonic with overlaps:

This function finds monotonic & overlapping sequences; you can easily change it to work non-monotonically by changing >= and <= to > and < respectively, or even make it parametrizable.

def find_sequences(lst, min_len=3):
    asc, desc = [], []
    for i in lst:
        if not asc or i >= asc[-1]:
            asc.append(i)
        else:
            if len(asc) >= min_len:
                yield asc
            asc = [i]

        if not desc or i <= desc[-1]:
            desc.append(i)
        else:
            if len(desc) >= min_len:
                yield desc
            desc = [i]

    if len(desc) >= min_len:
        yield desc
    if len(asc) >= min_len:
        yield asc

yields:

[[0, 1, 3, 8, 10, 13, 13], [13, 13, 8, 4], [11, 5, 1, 0]]

with performance:

In [3]: timeit list(find_sequences(x))
100000 loops, best of 3: 10.5 µs per loop

The following should work... it breaks the data into disjoint monotonic subsequences and then filters by your length criteria.

def get_monotonic_subsequences(data, min_length):
    direction = data[1] - data[0] #determine direction of initial subsequence
    subsequences = []
    cur_seq = []
    for i in range(0, len(data) - 1):
        if direction > 0:
            if (data[i] >= data[i-1]):
                cur_seq.append(data[i])
            else:
                subsequences.append(cur_seq)
                cur_seq = [data[i]]
                direction = data[i+1] - data[i]
        else:
            if (data[i] <= data[i-1]):
                cur_seq.append(data[i])
            else:
                subsequences.append(cur_seq)
                cur_seq = [data[i]]
                direction = data[i+1] - data[i]

    if  (data[-1] - data[-2])*direction > 0:
        cur_seq.append(data[-1])
        subsequences.append(cur_seq)
    else:
        subsequences.append(cur_seq)
        subsequences.append([data[-1]])
    return [x for x in subsequences if len(x) >= min_length]

As an aside, it's not clear from your question, but your output suggests that you expect the subsequences to be collected greedily from left to right, which this code assumes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM