简体   繁体   中英

Remove overlapping spans from list of tuple values based on the length of spans in python

I have a list of tuples with character spans. But there are instances where there is an overlap of the spans. My aim is to modify the tuple list in such a way that for overlaps only the larger span values is kept and smaller span deleted

Example:

Original list: [(2, 3), (7, 9), (10, 11), (10, 12), (15, 17), (16, 17), (20, 21), (20, 29), (21, 28)]
Modified list: [(2, 3), (7, 9), (10, 12), (15, 17), (16, 17), (20, 29)]

Here (10,11) , (16,17) , (20,21) , (21,28) were removed as they had a bigger span overlap with (10,12) , (15,17) and (20,29) respectively.

I found some answers which deal with the overlap like this but these don't deal with the larger span thing.

My thought was to sort on length of the span difference in descending order and then search for overlaps somehow. This search for overlap is something I cannot figure out

Code

A =  [(2, 3), (7, 9), (10, 11), (10, 12), (15, 17), (16, 17), (20, 21), (20, 29), (21, 28)]
tempD = {}
for item in A:
    if item[0] in A:
        if item[1] <= A[item[0]]:
            next
    tempD[item[0]] = item[1]
output = [ (key,value) for _, (key, value) in enumerate(tempD.items())]
print(output)

Output

[(2, 3), (7, 9), (10, 12), (15, 17), (16, 17), (20, 29), (21, 28)]

Code explanation

Here, the first thing we are doing is placing the values in the dictionary, this means we will not have any overlap, and when this overlap is expected, we ensure the values are saved only if they are higher.

After setting the dictionary up, we convert back to the wanted format of a list. It is possible to shorten this significantly, but this should be clear and simple enough to understand hopefully. Let me know if anything is unclear.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM