简体   繁体   中英

Merge two lists of tuples on the basis of the tuples values

I have two lists ref_list and data_list containing each a tuples with the first element being like a time in second and the second one being a random value as :

ref_list = [(1,value_ref_1),(3,value_ref_3),(4,value_ref_4), ... ]
data_list = [(1,value_dat_1),(2,value_dat_2),(4,value_dat_4), ... ]

I want to compute the difference of the second values as a function of time (first value of tuples). Wich means, a list of tuples which first value would be a time and the second the difference of second values. And I want it to be able to manage missing data in any of the two list using last time ! For the previous example, the result would be :

res_list = [(1,value_dat_1-value_ref_1),(2,value_dat_2-value_ref_1),(3,value_dat_2-value_ref_3),(4,value_dat_4-value_ref_4), ... ]

In this example, the tuple (2,value_dat_2-value_ref_1) was created with tuples (2,value_dat_2) and (1,value_ref_1) because a tuple with 2 as first was missing in ref_list . Same idea the other way around for (3,value_dat_2-value_ref_3)

I can't figure out how to do it with a list comprehension.

I hope I was clear enough.

Thanks a lot.

Edit 1 : IndexError : if both list have the same length, you shouldn't have an index error. data_list[i] will give the ith element of of data_list, regardless of its content. And when you pop a value, from a python list(), it 'moves' the indexes, so you don't have an index gap (unlike other languages). Or maybe I didn't understand well your concern.

Missing data: yes, yes. So you need to return multiple values in case of a missing one: the upper and the lower bounds

[(elt[0],data_list[i][1]-elt[1]) if data_list[i][0]==elt[0] else ((elt[0],data_list[i][1]-ref_list[i-1][1]),(elt[0],data_list[i][1]-ref_list[i+1][1])) for i,elt in enumerate(ref_list)]

This way, if a value is missing, it'll go search for the previous value and the next value, so you could have the bounds of the missing value. I have no other choice than returning for the 'else' tuples in another structure, 'cause I can return only one 'value' at each turn. ( or face a SyntaxError : invalid syntax at the 'for')

Even if you may need these tuples of tuples (to detect a value is missing), you might want to know another solution - an explicit generator, there.

def generator_stuff(data_list,ref_list):
    for i,elt in enumerate(ref_list):
        if data_list[i][0]==elt[0]:
            yield (elt[0],data_list[i][1]-elt[1])
        else:
            yield (elt[0],data_list[i][1]-ref_list[i-1][1])
            yield (elt[0],data_list[i][1]-ref_list[i+1][1])

I have absolutely no idea of the performance of this, but as it return each tuple individually, you won't have tuples of tuples.

Ran the following additionally with two lists with 500k values each, 100mb/200mb (depending on generation parameters) stable memory usage

list_a = [(1,222),(2,444),(5,666),(10,888)]
list_b = [(1,111),(3,333),(7,555),(9,777),(10,888)]

list_c = []

i = 1
a = None
b = None


def get_new(a, for_time):
    if len(a) == 0:
        raise IndexError

    # in the future
    if a[0][0] > for_time:
        return None

    return a.pop(0)

list_a_exhausted = False
list_b_exhausted = False

while True:     
    try:
        a = get_new(list_a,i) or a
    except IndexError:
        list_a_exhausted = True

    try:
        b = get_new(list_b,i) or b  
    except IndexError:
        list_b_exhausted = True

    if list_a_exhausted and list_b_exhausted:
        break

    list_c.append([(i,b[1]-a[1])])  
    i = i + 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM