简体   繁体   中英

Python - Comparing each item of a list to every other item in that list

I need to compare every item in a very long list (12471 items) to every other item in the same list. Below is my list:

[array([3, 4, 5])
array([ 6,  8, 10])
array([ 9, 12, 15])
array([12, 16, 20])
array([15, 20, 25])
...]                 #12471 items long

I need to compare the second item of each array to the first item of every other array to see if they're equal. And preferably, in a very efficient way. Is there a simple and efficient way to do this in Python 2.x?


I worked up a very crude method here, but it is terribly slow:

ls=len(myList)       #12471
l=ls
k=0
for i in myList:
        k+=1
        while l>=0:
            l-=1
            if i[1]==myList[l][0]:
                #Do stuff
        l=ls

While this is still theoretically N^2 time (worst case), it should make things a bit better:

import collections

inval = [[3, 4, 5],
[ 6,  8, 10],
[ 9, 12, 15],
[ 12, 14, 15],
[12, 16, 20],
[ 6,  6, 10],
[ 8,  8, 10],
[15, 20, 25]]

by_first = collections.defaultdict(list)
by_second = collections.defaultdict(list)

for item in inval:
    by_first[item[0]].append(item)
    by_second[item[1]].append(item)

for k, vals in by_first.items():
    if k in by_second:
        print "by first:", vals, "by second:", by_second[k]

Output of my simple, short case:

by first: [[6, 8, 10], [6, 6, 10]] by second: [[6, 6, 10]]
by first: [[8, 8, 10]] by second: [[6, 8, 10], [8, 8, 10]]
by first: [[12, 14, 15], [12, 16, 20]] by second: [[9, 12, 15]]

Though this DOES NOT handle duplicates.

We can do this in O(N) with an assumption that python dict takes O(1) time for insert and lookup.

  1. In the first scan, we create a map storing first number and row index by scanning the full list
  2. In the second scan, we find if map from first scan contains second element of each row. If map contains then value of map gives us the list of row indices that match the required criterion.
myList = [[3, 4, 5], [ 6,  8, 10], [ 9, 12, 15], [12, 16, 20], [15, 20, 25]]

    first_column = dict()
    for idx, list in enumerate(myList):
        if list[0] in first_column:
            first_column[list[0]].append(idx)
        else:
            first_column[list[0]] = [idx]

    for idx, list in enumerate(myList):
        if list[1] in first_column:
            print ('rows matching for element {} from row {} are {}'.format(list[1], idx,  first_column[list[1]]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM