I'm trying to compare two lists of integers, each the same size, in Python 2.6. The comparison I need is to compare the first item in List 1 with the first item in List 2, the second item in List 1 with the second item in List 2, and so on, and returns a result if ALL of the list items follow the same comparison criteria. It should behave as follows:
list1 = [1,1,1,1]
list2 = [2,1,2,3]
compare(list1,list2)
# returns a "list 1 is <= list 2" response.
list1 = [4,1,4,3]
list2 = [2,1,2,3]
compare(list1,list2)
# returns a "list 1 is >= list 2" response.
list1 = [3,2,3,2]
list2 = [1,4,1,4]
compare(list1,list2)
# returns None— some items in list1 > list2, and some items in list2 > list1.
I figured I could write the code like the following block, but I don't know if it's the most efficient. My program is going to be calling this method a LOT so I want to streamline this as much as possible.
def compare(list1,list2):
gt_found = 0
lt_found = 0
for x in range(len(list1)):
if list1[x] > list2[x]:
gt_found += 1
elif list1[x] < list2[x]:
lt_found += 1
if gt_found > 0 and lt_found > 0:
return None #(some items >, some items <)
if gt_found > 0:
return 1 #(list1 >= list2)
if lt_found > 0:
return -1 #(list1 <= list2)
return 0 #(list1 == list2)
Is it already as good as it's going to get (big-O of n), or is there a faster way to go about it (or a way that uses system functions instead)?
CLARIFICATION: I expect the case that returns 'None' to happen the most often, so it is important.
Are you familiar with the wonderful zip
function?
import itertools
def compare(xs, ys):
all_less = True
all_greater = True
for x, y in itertools.izip(xs, ys):
if not all_less and not all_greater:
return None
if x > y:
all_less = False
elif x < y:
all_greater = False
if all_less:
return "list 1 is <= list 2"
elif all_greater:
return "list 1 is >= list 2"
return None # all_greater might be set False on final iteration
Zip takes two lists ( xs
and ys
in this case, but call them whatever you want) and creates an iterator for a sequence of tuples.
izip([1,2,3,4], [4,3,2,1]) == [(1,4), (2,3), (3,2), (4,1)]
This way you can iterate through both lists simultaneously and compare each value in tandem. The time complexity should be O(n), where n is the size of your lists.
It will return early in cases where neither the >= or <= condition are met.
As James Matta points out, itertools.izip
performs better than the standard zip
in Python 2. This isn't true in Python 3, where the standard zip
works the way izip
does in older versions.
You can consider a numpy-based vectorized comparison.
import numpy as np
a = [1,1,1,2]
b = [2,2,4,3]
all_larger = np.all(np.asarray(b) > np.asarray(a)) # true if b > a holds elementwise
print all_larger
True
Clearly, you can engineer the thing to have your answer.
all_larger = lambda b,a : np.all(np.asarray(b) > np.asarray(a))
if all_larger(b,a):
print "b > a"
elif all_larger(a,b):
print "a > b"
else
print "nothing!"
Every type of comparison such as <, >, <=, >=,
can be done.
For anyone interested in the performance of the two methods, I named the iterative method 'tortoise' and the numpy method 'hare', and tested it with the code below.
At first, the 'tortoise' won [.009s [T] vs .033s [H]], but I checked it and found that asarray() was being called more often than it need to be. With that fix, the 'hare' won again, [.009s [T] vs .006s [H]].
The data is here: http://tny.cz/64d6e5dc
It consists of 28 lines of about 950 elements in length. Four of the lines collectively >= all the others.
It might be interesting to see how the performance works on larger data sets.
import itertools, operator, math
import cProfile
import numpy as np
data = #SEE PASTEBIN
def tortoise(xs, ys):
all_less = True
all_greater = True
for x, y in zip(xs, ys):
if not all_less and not all_greater:
return None
if x > y:
all_less = False
elif x < y:
all_greater = False
if all_greater and all_less:
return 0
if all_greater:
return 1
if all_less:
return -1
return None # all_greater might be set False on final iteration
hare = lambda b,a : np.all(b >= a)
def find_uniques_tortoise():
include_list = range(len(data))
current_list_index = 0
while current_list_index < len(data):
if current_list_index not in include_list:
current_list_index += 1
continue
for x in range(current_list_index+1,len(data)):
if x not in include_list:
continue
result = tortoise(data[current_list_index], data[x])
if result is None: #no comparison
continue
elif result == 1 or result == 0: # this one beats the other one
include_list.remove(x)
continue
elif result == -1: #the other one beats this one
include_list.remove(current_list_index)
break
current_list_index +=1
return include_list
def find_uniques_hare():
include_list = range(len(data))
current_list_index = 0
#do all asarray()s beforehand for max efficiency
for x in range(len(data)):
data[x] = np.asarray(data[x])
while current_list_index < len(data):
if current_list_index not in include_list:
current_list_index += 1
continue
for x in range(current_list_index+1,len(data)):
if x not in include_list:
continue
if hare(data[current_list_index], data[x]): # this one beats the other one, or it's a tie
include_list.remove(x)
# print x
continue
elif hare(data[x], data[current_list_index]): #the other one beats this one
include_list.remove(current_list_index)
# print current_list_index
break
else: #no comparison
continue
current_list_index +=1
return include_list
cProfile.run('find_uniques_tortoise()')
cProfile.run('find_uniques_hare()')
print find_uniques_tortoise()
print
print find_uniques_hare()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.