简体   繁体   中英

Python: Most efficient way to compare two lists of integers

I'm trying to compare two lists of integers, each the same size, in Python 2.6. The comparison I need is to compare the first item in List 1 with the first item in List 2, the second item in List 1 with the second item in List 2, and so on, and returns a result if ALL of the list items follow the same comparison criteria. It should behave as follows:

list1 = [1,1,1,1]
list2 = [2,1,2,3]
compare(list1,list2) 
# returns a "list 1 is <= list 2" response.

list1 = [4,1,4,3]
list2 = [2,1,2,3]
compare(list1,list2) 
# returns a "list 1 is >= list 2" response.

list1 = [3,2,3,2]
list2 = [1,4,1,4]
compare(list1,list2) 
# returns None— some items in list1 > list2, and some items in list2 > list1.

I figured I could write the code like the following block, but I don't know if it's the most efficient. My program is going to be calling this method a LOT so I want to streamline this as much as possible.

def compare(list1,list2):
    gt_found = 0
    lt_found = 0
    for x in range(len(list1)):
        if list1[x] > list2[x]:
            gt_found += 1
        elif list1[x] < list2[x]:        
            lt_found += 1
        if gt_found > 0 and lt_found > 0:
            return None   #(some items >, some items <)
    if gt_found > 0:
        return 1          #(list1 >= list2)
    if lt_found > 0:
        return -1         #(list1 <= list2)
    return 0              #(list1 == list2)

Is it already as good as it's going to get (big-O of n), or is there a faster way to go about it (or a way that uses system functions instead)?

CLARIFICATION: I expect the case that returns 'None' to happen the most often, so it is important.

Are you familiar with the wonderful zip function?

import itertools

def compare(xs, ys):
  all_less = True
  all_greater = True

  for x, y in itertools.izip(xs, ys):
    if not all_less and not all_greater:
      return None
    if x > y:
      all_less = False
    elif x < y:
      all_greater = False

  if all_less:
    return "list 1 is <= list 2"
  elif all_greater:
    return "list 1 is >= list 2"
  return None  # all_greater might be set False on final iteration

Zip takes two lists ( xs and ys in this case, but call them whatever you want) and creates an iterator for a sequence of tuples.

izip([1,2,3,4], [4,3,2,1]) == [(1,4), (2,3), (3,2), (4,1)]

This way you can iterate through both lists simultaneously and compare each value in tandem. The time complexity should be O(n), where n is the size of your lists.

It will return early in cases where neither the >= or <= condition are met.

Update

As James Matta points out, itertools.izip performs better than the standard zip in Python 2. This isn't true in Python 3, where the standard zip works the way izip does in older versions.

You can consider a numpy-based vectorized comparison.

import numpy as np

a = [1,1,1,2]
b = [2,2,4,3]

all_larger = np.all(np.asarray(b) > np.asarray(a))  # true if b > a holds elementwise

print all_larger

        True

Clearly, you can engineer the thing to have your answer.

all_larger = lambda b,a : np.all(np.asarray(b) > np.asarray(a))

if all_larger(b,a):
       print "b > a"
elif all_larger(a,b):
       print "a > b"

else
       print "nothing!"

Every type of comparison such as <, >, <=, >=, can be done.

For anyone interested in the performance of the two methods, I named the iterative method 'tortoise' and the numpy method 'hare', and tested it with the code below.

At first, the 'tortoise' won [.009s [T] vs .033s [H]], but I checked it and found that asarray() was being called more often than it need to be. With that fix, the 'hare' won again, [.009s [T] vs .006s [H]].

The data is here: http://tny.cz/64d6e5dc
It consists of 28 lines of about 950 elements in length. Four of the lines collectively >= all the others.

It might be interesting to see how the performance works on larger data sets.

import itertools, operator, math
import cProfile
import numpy as np

data = #SEE PASTEBIN

def tortoise(xs, ys):
    all_less = True
    all_greater = True

    for x, y in zip(xs, ys):
        if not all_less and not all_greater:
          return None
        if x > y:
          all_less = False
        elif x < y:
          all_greater = False

    if all_greater and all_less:
        return 0
    if all_greater:
        return 1
    if all_less:
        return -1
    return None  # all_greater might be set False on final iteration


hare = lambda b,a : np.all(b >= a)

def find_uniques_tortoise():
    include_list = range(len(data))
    current_list_index = 0
    while current_list_index < len(data):
        if current_list_index not in include_list:
            current_list_index += 1
            continue
        for x in range(current_list_index+1,len(data)):
            if x not in include_list:
                continue
            result = tortoise(data[current_list_index], data[x])
            if result is None: #no comparison
                continue 
        elif result == 1 or result == 0: # this one beats the other one
            include_list.remove(x)
            continue
        elif result == -1: #the other one beats this one
            include_list.remove(current_list_index)
            break
    current_list_index +=1
return include_list

def find_uniques_hare():
    include_list = range(len(data))
    current_list_index = 0
    #do all asarray()s beforehand for max efficiency
    for x in range(len(data)): 
        data[x] = np.asarray(data[x])
    while current_list_index < len(data):
        if current_list_index not in include_list:
            current_list_index += 1
            continue
        for x in range(current_list_index+1,len(data)):
            if x not in include_list:
                continue
            if hare(data[current_list_index], data[x]): # this one beats the other one, or it's a tie
                include_list.remove(x)
            #   print x
                continue
            elif hare(data[x], data[current_list_index]): #the other one beats this one
                include_list.remove(current_list_index)
            #   print current_list_index
                break
            else: #no comparison
                continue
        current_list_index +=1
    return include_list             



cProfile.run('find_uniques_tortoise()')
cProfile.run('find_uniques_hare()')

print find_uniques_tortoise()
print   
print find_uniques_hare()       

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM