Python：最快的方式来比较数组元素

Question

我正在寻找输出Python中两个数组的第一个差异的索引的最快方法。 例如，让我们采用以下两个数组：

test1 = [1, 3, 5, 8]
test2 = [1]
test3 = [1, 3]

比较test1和test2 ，我想输出1 ，而test1和test3的比较应输出2 。

换句话说，我寻找相当于声明：

import numpy as np
np.where(np.where(test1 == test2, test1, 0) == '0')[0][0]

具有不同的阵列长度。

任何帮助表示赞赏。

Answer 1

对于列表，这适用：

from itertools import zip_longest

def find_first_diff(list1, list2):
    for index, (x, y) in enumerate(zip_longest(list1, list2, 
                                               fillvalue=object())):
        if x != y:
            return index

zip_longest使用None或提供的填充值填充较短的列表。 如果差异是由不同的列表长度而不是列表中的实际不同值引起的，则标准zip不起作用。

在Python 2上使用izip_longest 。

更新：创建唯一填充值以避免将None作为列表值的潜在问题。 object()是唯一的：

>>> o1 = object()
>>> o2 = object()
>>> o1 == o2
False

这种纯Python方法可能比NumPy解决方案更快。 这取决于实际数据和其他情况。

将列表转换为NumPy数组也需要时间。 实际上这可能比使用上面的函数找到索引要花费更长的时间。 如果您不打算将NumPy数组用于其他计算，则转换可能会导致相当大的开销。
NumPy总是搜索完整的数组。 如果差异很早，那么你做的工作比你需要的要多得多。
NumPy创建了一堆中间数组。 这会花费记忆和时间。
NumPy需要构造具有最大长度的中间数组。 比较许多小型和非常大的阵列在这里是不利的。

通常，在许多情况下，NumPy比纯Python解决方案更快。 但每种情况都有所不同，有些情况下纯Python更快。

Answer 2

使用numpy数组（对于大数组来说会更快）然后你可以检查列表的长度然后（也）检查重叠部分，如下所示（显然切片越长越短）：

import numpy as np

n = min(len(test1), len(test2))
x = np.where(test1[:n] != test2[:n])[0]
if len(x) > 0:
  ans = x[0]
elif len(test1) != len(test2):
  ans = n
else:
  ans = None

编辑 - 尽管这被拒绝，我会在这里留下我的答案，以防其他人需要做类似的事情。

如果起始数组很大且numpy，那么这是最快的方法 。 此外，我不得不修改Andy的代码以使其工作。 顺序：1。我的建议，2。Paidric（现已删除，但最优雅），3。Andy接受的答案，4。拉链 - 非numpy，5。没有拉链的香草蟒蛇@leekaiinthesky

0.1ms ，9.6ms，0.6ms，2.8ms，2.3ms

如果转换为ndarray包含在timeit中，那么非numpy nop-zip方法是最快的

7.1ms，17.1ms，7.7ms，2.8ms， 2.3ms

如果两个列表之间的差异在索引1,000而不是10,000，则更是如此

7.1ms，17.1ms，7.7ms，0.3ms， 0.2ms

import timeit

setup = """
import numpy as np
from itertools import zip_longest
list1 = [1 for i in range(10000)] + [4, 5, 7]
list2 = [1 for i in range(10000)] + [4, 4]
test1 = np.array(list1)
test2 = np.array(list2)

def find_first_diff(l1, l2):
    for index, (x, y) in enumerate(zip_longest(l1, l2, fillvalue=object())):
        if x != y:
            return index

def findFirstDifference(list1, list2):
  minLength = min(len(list1), len(list2))
  for index in range(minLength):
    if list1[index] != list2[index]:
      return index
  return minLength
"""

fn = ["""
n = min(len(test1), len(test2))
x = np.where(test1[:n] != test2[:n])[0]
if len(x) > 0:
  ans = x[0]
elif len(test1) != len(test2):
  ans = n
else:
  ans = None""",
"""
x = np.where(np.in1d(list1, list2) == False)[0]
if len(x) > 0:
  ans = x[0]
else:
  ans = None""",
"""
x = test1
y = np.resize(test2, x.shape)
x = np.where(np.where(x == y, x, 0) == 0)[0]
if len(x) > 0:
  ans = x[0]
else:
  ans = None""",
"""
ans = find_first_diff(list1, list2)""",
"""
ans = findFirstDifference(list1, list2)"""]

for f in fn:
  print(timeit.timeit(f, setup, number = 1000))

Answer 3

最快的算法会将每个元素与第一个差异进行比较，而不是更多。 因此，成对地迭代这两个列表将会给你：

def findFirstDifference(list1, list2):
  minLength = min(len(list1), len(list2))
  for index in xrange(minLength):
    if list1[index] != list2[index]:
      return index
  return minLength # the two lists agree where they both have values, so return the next index

这给出了你想要的输出：

print findFirstDifference(test1, test3)
> 2

Answer 4

这是一种方法：

from itertools import izip
def compare_lists(lista, listb):
    """
    Compare two lists and return the first index where they differ. if
    they are equal, return the list len
    """
    for position, (a, b) in enumerate(zip(lista, listb)):
        if a != b:
            return position
    return min([len(lista), len(listb)])

算法很简单： zip （或者在这种情况下，更有效的izip ）两个列表，然后逐个元素地比较它们。
eumerate函数给出了索引位置，如果发现差异，我们可以返回该位置
如果我们退出for循环而没有任何返回，则可能发生以下两种可能性之一：
1. 这两个清单完全相同。 在这种情况下，我们希望返回任一列表的长度。
2. 列表具有不同的长度，并且它们等于较短列表的长度。 在这种情况下，我们想要返回较短列表的长度
在以太的情况下， min(...)表达式就是我们想要的。
这个函数有一个错误：如果比较两个空列表，它返回0，这似乎是错误的。 我会留给你修理它作为练习。

Answer 5

感谢您的所有建议，我刚刚找到了一个更简单的方法来解决我的问题：

x = numpy.array(test1)
y = np.resize(numpy.array(test2), x.shape)
np.where(np.where(x == y, x, 0) == '0')[0][0]

Answer 6

这是一个公认的不是非常pythonic，numpy-free刺：

b = zip (test1, test2)
c = 0
while b:        
    b = b[1:]
    if not b or b[0][0] != b[0][1]:
        break
    else:
        c = c + 1
print c

Answer 7

对于Python 3.x：

  def first_diff_index(ls1, ls2):
    l = min(len(ls1), len(ls2)) 
    return next((i for i in range(l) if ls1[i] != ls2[i]), l)

（对于Python 2.7以上，用xrange替换range ）

Python：最快的方式来比较数组元素

问题描述

7 个解决方案

解决方案1
6 已采纳 2015-05-10 17:40:26

解决方案2
4 2015-05-10 17:54:51

解决方案3
1 2015-05-10 17:43:29

解决方案4
1 2015-05-10 18:03:06

解决方案5
0 2015-05-10 18:06:59

解决方案6
0 2015-05-10 18:18:11

解决方案7
0 2017-11-22 15:58:37

Python：最快的方式来比较数组元素

问题描述

7 个解决方案

解决方案1 6 已采纳 2015-05-10 17:40:26

解决方案2 4 2015-05-10 17:54:51

解决方案3 1 2015-05-10 17:43:29

解决方案4 1 2015-05-10 18:03:06

解决方案5 0 2015-05-10 18:06:59

解决方案6 0 2015-05-10 18:18:11

解决方案7 0 2017-11-22 15:58:37

解决方案1
6 已采纳 2015-05-10 17:40:26

解决方案2
4 2015-05-10 17:54:51

解决方案3
1 2015-05-10 17:43:29

解决方案4
1 2015-05-10 18:03:06

解决方案5
0 2015-05-10 18:06:59

解决方案6
0 2015-05-10 18:18:11

解决方案7
0 2017-11-22 15:58:37