Efficiently comparing the first item in each list of two large list of lists?

Question

I'm currently working with aa large list of lists (~280k lists) and a smaller one (~3.5k lists). I'm trying to efficiently compare the first index in the smaller list to the first index in the large list. If they match, I want to return both lists from the small and large list that have a matching first index.

For example:

Large List 1:

[[a,b,c,d],[e,f,g,h],[i,j,k,l],[m,n,o,p]]

Smaller list 2:

[[e,q,r,s],[a,t,w,s]]

Would return

[([e,q,r,s],[e,f,g,h]),([a,t,w,s],[a,b,c,d])]

I currently have it setup as shown here, where a list of tuples is returned with each tuple holding the two lists that have a matching first element. I'm fine with any other data structures being used. I was trying to use a set of tuples but was having issues trying to figure out how to do it quicker than what I already have.

My code to to compare these two list of lists is currently this:

match = []
for list_one in small_list:
    for list_two in large_list:
        if str(list_one[0]).lower() in str(list_two[0]).lower():
            match.append((spm_values, cucm_values))
            break
return match

Answer 1

Assuming order doesn't matter, I would highly recommend using a dictionary to map prefix (one character) to items and set to find matches:

# generation of data... not important
>>> lst1 = [list(c) for c in ["abcd", "efgh", "ijkl", "mnop"]]
>>> lst2 = [list(c) for c in ["eqrs", "atws"]]

# mapping prefix to list (assuming uniqueness)
>>> by_prefix1 = {chars[0]: chars for chars in lst1}
>>> by_prefix2 = {chars[0]: chars for chars in lst2}

# actually finding matches by intersecting sets (fast)
>>> common = set(by_prefix1.keys()) & set(by_prefix2.keys())
>>> tuples = tuple(((by_prefix1[k], by_prefix2[k]) for k in common))
>>> tuples

Answer 2

Here's a one liner using list comprehension. I'm not sure how efficient it is, though.

large = [list(c) for c in ["abcd", "efgh", "ijkl", "mnop"]]
small = [list(c) for c in ["eqrs", "atws"]]
ret = [(x,y) for x in large for y in small if x[0] == y[0]]

print ret
#output
[(['a', 'b', 'c', 'd'], ['a', 't', 'w', 's']), (['e', 'f', 'g', 'h'], ['e', 'q', 'r', 's'])]

Answer 3

I'm actually using Python 2.7.11, although I guess this may work.

l1 =[['a','b','c','d'],['e','f','g','h'],['i','j','k','l'],['m','n','o','p']]
l2 =[['e','q','r','s'],['a','t','w','s']]

def org(Smalllist,Largelist):
    L = Largelist
    S = Smalllist
    Final = [] 
    for i in range(len(S)):
        for j in range(len(L)):
            if S[i][0] == L[j][0]:
                Final.append((S[i],L[j]))
    return Final

I suggest you to put the Smaller list in the first variable in order to get the results in the order you expected.

It's very important that you enter these letters as strings upon testing, as I did, otherwise they might be considered variables and the code will not run properly.

Efficiently comparing the first item in each list of two large list of lists?

Question

3 answers

solution1
4 ACCPTED 2016-07-14 18:51:19

solution2
0 2016-07-14 19:04:54

solution3
-1 2016-07-14 18:49:00

Efficiently comparing the first item in each list of two large list of lists?

Question

3 answers

solution1 4 ACCPTED 2016-07-14 18:51:19

solution2 0 2016-07-14 19:04:54

solution3 -1 2016-07-14 18:49:00

solution1
4 ACCPTED 2016-07-14 18:51:19

solution2
0 2016-07-14 19:04:54

solution3
-1 2016-07-14 18:49:00