简体   繁体   中英

How to get rid of sub-tuples in this list?

list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]

Since (0,2) & (4,6) are both within the indexes of (0,6) , so I want to remove them. The resulting list would be:

list_of_tuple = [(0,6), (6,7), (8,9)]

It seems I need to sort this tuple of list somehow to make it easier to remove. But How to sort a list of tuples?

Given two tuples of array indexes, [m,n] and [a,b] , if:

m >=a & n<=b

Then [m,n] is included in [a,b] , then remove [m,n] from the list.

To remove all tuples from list_of_tuples with a range out of the specified tuple:

list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]

def rm(lst,tup):
    return [tup]+[t for t in lst if t[0] < tup[0] or t[1] > tup[1]]

print(rm(list_of_tuple,(0,6)))

Output:

[(0, 6), (6, 7), (8, 9)]

Here's a dead-simple solution, but it's O(n 2 ):

intervals = [(0, 2), (0, 6), (4, 6), (6, 7), (8, 9)]  # list_of_tuple
result = [
    t for t in intervals
    if not any(t != u and t[0] >= u[0] and t[1] <= u[1] for u in intervals)
    ]

It filters out intervals that are not equal to, but contained in, any other intervals.

Seems like an opportunity to abuse both reduce() and Python's logical operators, Solution assumes list is sorted as in the OP's example, primarily on the second element of each tuple: and secondarily on the first:

from functools import reduce

list_of_sorted_tuples = [(0, 2), (0, 6), (4, 6), (6, 7), (8, 9)]

def contains(a, b):
    return a[0] >= b[0] and a[1] <= b[1] and [b] or b[0] >= a[0] and b[1] <= a[1] and [a] or [a, b]

reduced_list = reduce(lambda x, y: x[:-1] + contains(x[-1], y) if x else [y], list_of_sorted_tuples, [])

print(reduced_list)

OUTPUT

> python3 test.py
[(0, 6), (6, 7), (8, 9)]
>

You could try something like this to check if both ends of the (half-open) interval are contained within another interval:

list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]
reduced_list = []
for t in list_of_tuple:
    add = True
    for o in list_of_tuple:
        if t is not o:
            r = range(*o)
            if t[0] in r and (t[1] - 1) in r:
                add = False

    if add:
        reduced_list.append(t)

print(reduced_list) # [(0, 6), (6, 7), (8, 9)]

Note: This assumes that your tuples are half-open intervals, ie [0, 6) where 0 is inclusive but 6 is exclusive, similar to how range would treat the start and stop parameters. A couple of small changes would have to be made for the case of fully closed intervals:

range(*o) -> range(o[0], o[1] + 1)

and

if t[0] in r and (t[1] - 1) in r: -> if t[0] in r and t[1] in r:

Here is the first step towards a solution that can be done in O(n log(n)):

def non_cont(lot):
    s = sorted(lot, key = lambda t: (t[0], -t[1]))
    i = 1
    while i < len(s):
        if s[i][0] >= s[i - 1][0] and s[i][1] <= s[i - 1][1]:
            del s[i]
        else:
            i += 1
    return s

The idea is that after sorting using the special key function, the each element that is contained in some other element, will be located directly after an element that contains it. Then, we sweep the list, removing elements that are contained by the element that precedes them. Now, the sweep and delete loop is, itself, of complexity O(n^2). The above solution is for clarity, more than anything else. We can move to the next implementation:

def non_cont_on(lot):
    s = sorted(lot, key = lambda t: (t[0], -t[1]))
    i = 1
    result = s[:1]
    for i in s:
        if not (i[0] >= result[-1][0] and i[1] <= result[-1][1]):
            result.append(i)
    return result

There is no quadratic sweep and delete loop here, only a nice, linear process of constructing the result. Space complexity is O(n). It is possible to perform this algorithm without extra, non-constant, space, but I will leave this out.

A side effect of both algorithm is that the intervals are sorted.

If you want to preserve the information about the inclusion-structure (by which enclosing interval an interval of the original set is consumed) you can build a "one-level tree":

def contained(tpl1, tpl2):
    return tpl1[0] >= tpl2[0] and tpl1[1] <= tpl2[1] 

def interval_hierarchy(lst):
    if not lst:
        return
    root = lst.pop()
    children_dict = {root: []}
    while lst:
        t = lst.pop()
        curr_children = list(children_dict.keys())
        for k in curr_children:
            if contained(k, t):
                children_dict[t] = (children_dict[t] if t in children_dict else []) +\
                                   [k, *children_dict[k]]
                children_dict.pop(k)
            elif contained(t, k):
                children_dict[k].append(t)
                if t in children_dict:
                    children_dict[k] += children_dict[t]
                    children_dict.pop(t)
            else:
                if not t in children_dict:
                    children_dict[t] = []
    # return whatever information you might want to use
    return children_dict, list(children_dict.keys())

It appears you are trying to merge intervals which are overlapping. For example, (9,11), (10,12) are merged in the second example below to produce (9,12).

In that case, a simple sort using sorted will automatically handle tuples.

Approach: Store the next interval to be added. Keep extending the end of the interval until you encounter a value whose "start" comes after (>=) the "end" of the next value to add. At that point, that stored next interval can be appended to the results. Append at the end to account for processing all values.

def merge_intervals(val_input):
    if not val_input:
        return []
    vals_sorted = sorted(val_input)   # sorts by tuple values "natural ordering"
    result = []
    x0, x1 = vals_sorted[0]           # store next interval to be added as (x0, x1)
    for start, end in vals_sorted[1:]:
        if start >= x1:               # reached next separate interval
            result.append((x0, x1))
            x0, x1 = (start, end)
        elif end > x1:
            x1 = end                  # extend length of next interval to be added
    result.append((x0, x1))
    return result

print(merge_intervals([(0,2), (0,6), (4,6), (6,7), (8,9)]))
print(merge_intervals([(1,2), (9,11), (10,12), (1,7)]))

Output:

[(0, 6), (6, 7), (8, 9)]
[(1, 7), (9, 12)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM