简体   繁体   中英

Comparing value in a list to all other values

I have a list of latitudes, lats. I am trying to compare each latitude to each other latitude and find each combination for list items that fall within 0.01 of each other. The code I currently have does just that, however, it is also comparing each list value to itself.

lats = [79.826, 79.823, 79.855, 79.809]

for i in lats:
    for j in lats:
        if (i - 0.1) <= j <= (i + 0.1):
            print(str(i) +" and "+ str(j))

This returns the output:

79.826 and 79.826
79.826 and 79.823
79.826 and 79.855
79.826 and 79.809
79.823 and 79.826
79.823 and 79.823
79.823 and 79.855
79.823 and 79.809
79.855 and 79.826
79.855 and 79.823
79.855 and 79.855
79.855 and 79.809
79.809 and 79.826
79.809 and 79.823
79.809 and 79.855
79.809 and 79.809

You are implicitly computing a cross product; you could have written

for i, j in itertools.product(lats, repeat=2):
    if i - 0.1 <= j <= 1 + 0.1:
        ...

instead. What you want, though, are the 2-element combinations from the list:

for i, j in itertools.combinations(lats, 2):

For iterating and producing the lats combinations, while the itertools solution should be the preferred way, you may be interested into some way of coding this "by hand". Assuming that what you really want is just any two lats in any order, but just not couple duplicated, you can simply progressively restrict the second loop:

for i, x in enumerate(lats):
    for y in lats[i + 1:]:
        ...

Also, the condition as currently written is a bit too complex than needed. What you really want is that the two values x and y are less than some value d apart, hence you could write the condition:

(x - d) <= y <= (x + d):

as:

abs(x - y) <= d

Just add and i != j :

lats = [79.826, 79.823, 79.855, 79.809]

for i in lats:
    for j in lats:
        if (i - 0.1) <= j <= (i + 0.1) and i != j:
            print(str(i) +" and "+ str(j))

outputs:

79.826 and 79.823
79.826 and 79.855
79.826 and 79.809
79.823 and 79.826
79.823 and 79.855
79.823 and 79.809
79.855 and 79.826
79.855 and 79.823
79.855 and 79.809
79.809 and 79.826
79.809 and 79.823
79.809 and 79.855

There is this terse version using itertools.combinations and abs

from itertools import combinations
lats = [79.826, 79.823, 79.855, 79.809]
print([c for c in combinations(lats, 2) if abs(c[0] - c[1]) > 0.01])

which gives:

[(79.826, 79.855), (79.826, 79.809), (79.823, 79.855), (79.823, 79.809), (79.855, 79.809)]

Or with the formatting:

from itertools import combinations
lats = [79.826, 79.823, 79.855, 79.809]
close_lats = [c for c in combinations(lats, 2) if abs(c[0] - c[1]) > 0.01]
for combo in close_lats:
    print(f"{combo[0]} and {combo[1]}")

giving:

79.826 and 79.855
79.826 and 79.809
79.823 and 79.855
79.823 and 79.809
79.855 and 79.809

As an aside, your question says you seek those that are within 0.01 of each other, but your code sample seems to look within 0.1 or each other.

For efficiency you can use one of the Combinatoric iterators(depending on what you what the final result to be) from itertools and isclose from the math module:

from itertools import permutations
from math import isclose

lats = [79.826, 79.823, 79.855, 79.809]

for l1, l2 in permutations(lats, r=2):
    if isclose(l1, l2, rel_tol=0.01):
        print(f"{l1} and {l2}")

Output:

79.826 and 79.823
79.826 and 79.855
79.826 and 79.809
79.823 and 79.826
79.823 and 79.855
79.823 and 79.809
79.855 and 79.826
79.855 and 79.823
79.855 and 79.809
79.809 and 79.826
79.809 and 79.823
79.809 and 79.855

I think you should change your algorithm first to solve your problem and avoid counting multiple lats (eg 79.826 and 79.823 and 79.823 and 79.826 ) and second improve your code performance and reduce the complexity from O(n^2) to O(nlog(n)) (for sorting the list).

It's best to sort your list of lats and set two pointers to track the lower bound and upper bound of the list, which items fall within the range of 0.1.

Here is the code:

lats = [79.826, 79.823, 79.855, 79.809]
lats.sort()

i = 0
j = 1
while j < len(lats):
    if lats[j] - lats[i] <= 0.1:
        print(lats[i: j], lats[j])
        j += 1
    else:
        i += 1

Output:

[79.809] 79.823
[79.809, 79.823] 79.826
[79.809, 79.823, 79.826] 79.855

If you sort your list in the first step, you can make a much more efficient comparison and you can break the inner loop, when the first comparison fails. Because all next values will be even larger.

lats = [79.809, 79.823, 79.826, 79.855]
lats_sorted = sorted(lats)
for index, lat1 in enumerate(lats_sorted[:-1]):
    for lat2 in lats_sorted[index+1:]:
        if (lat2 - lat1 ) < 0.1:
            print(str(lat1) + " and " + str(lat2))
        else:
            break

I made a small runtime comparison for large lists (5000 elements)

def func1(lats):
    pairs = []
    lats_sorted = sorted(lats)
    for index, lat1 in enumerate(lats_sorted[:-1]):
        for lat2 in lats_sorted[index+1:]:
            if lat2 - lat1 <= 0.1:
                pairs.append((lat1, lat2))
            else:
                break
    return pairs


def func2(lats):
    pairs = []
    for i in lats:
        for j in lats:
            if (i - 0.1) <= j <= (i + 0.1):
                pairs.append((i, j))
    return pairs


def func3(lats):
    pairs = []
    for i, j in itertools.combinations(lats, 2):
        if (i - 0.1) <= j <= (i + 0.1):
            pairs.append((i, j))
    return pairs

def func4(lats):
    pairs = []
    for i in lats:
        for j in lats:
            if (i - 0.1) <= j <= (i + 0.1) and i != j:
                pairs.append((i, j))
    return pairs


lats = np.random.randint(0, 100000, 5000) / 1000

print(lats)

func_list = [func1, func2, func3, func4]

for func in func_list:

    start = time.time()
    pairs = func(lats)
    end = time.time()
    print(f"{func.__name__}: time = {end - start} s, pair count = {len(pairs)}")

The output is

[79.759 45.091 19.409 ... 24.691  5.114 64.561]
func1: time = 0.033899545669555664 s, pair count = 24972
func2: time = 6.784521102905273 s, pair count = 55155
func3: time = 2.624063491821289 s, pair count = 25077
func4: time = 6.442306041717529 s, pair count = 49929

showing, that my proposed algorithm (func1) is way faster than the others. The slight count difference between func1 and func3 (itertools solution) seems to be a numerical precision issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM