在對列表中查找唯一對

Question

我有一個（大）整數列表列表，例如，

a = [
    [1, 2],
    [3, 6],
    [2, 1],
    [3, 5],
    [3, 6]
    ]

大多數對會出現兩次，其中整數的順序無關緊要（即[1, 2]等價於[2, 1] ）。 我現在想找到只出現一次的對，並獲得一個指示該值的布爾列表。 對於上面的例子，

b = [False, False, False, True, False]

由於a通常很大，我想避免顯式循環。 可能建議映射到frozenset s，但我不確定這是否太過分了。

Answer 1

ctr = Counter(frozenset(x) for x in a)
b = [ctr[frozenset(x)] == 1 for x in a]

我們可以使用 Counter 來獲取每個列表的計數（將列表轉為frozenset 以忽略順序），然后檢查每個列表是否只出現一次。

Answer 2

這是一個使用 NumPy 的解決方案，它比建議的frozenset解決方案快 10 倍：

a = numpy.array(a)
a.sort(axis=1)
b = numpy.ascontiguousarray(a).view(
    numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
)
_, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
print(ct[inv] == 1)

排序很快，並確保原始數組中的邊[i, j] , [j, i]彼此識別。 比frozenset s或tuple s快得多。
受https://stackoverflow.com/a/16973510/353337啟發的行唯一化。

不同數組大小的速度比較：

該情節是用

from collections import Counter
import numpy
import perfplot


def fs(a):
    ctr = Counter(frozenset(x) for x in a)
    b = [ctr[frozenset(x)] == 1 for x in a]
    return b


def with_numpy(a):
    a = numpy.array(a)
    a.sort(axis=1)
    b = numpy.ascontiguousarray(a).view(
        numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
    )
    _, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
    res = ct[inv] == 1
    return res


perfplot.save(
    "out.png",
    setup=lambda n: numpy.random.randint(0, 10, size=(n, 2)),
    kernels=[fs, with_numpy],
    labels=["frozenset", "numpy"],
    n_range=[2 ** k for k in range(15)],
    xlabel="len(a)",
)

Answer 3

您可以從頭到尾掃描列表，同時將遇到的配對map到它們的第一個位置。 每當您處理一對時，您都會檢查之前是否遇到過它。 如果是這種情況，則 b 中第一次遭遇的索引和當前遭遇的索引都必須設置為 False。 否則，我們只需將當前索引添加到遇到的對的映射中，而不會更改 b。 b 最初將開始所有True 。 為了保持相同的[1,2]和[2,1] ，我首先簡單地對這對進行排序，以獲得穩定的表示。 代碼看起來像這樣：

def proc(a):
  b = [True] * len(a) # Better way to allocate this
  filter = {}
  idx = 0
  for p in a:
    m = min(p)
    M = max(p)
    pp = (m, M)
    if pp in filter:
      # We've found the element once previously
      # Need to mark both it and the current value as "False"
      # If we encounter pp multiple times, we'll set the initial
      # value to False multiple times, but that's not an issue
      b[filter[pp]] = False
      b[idx] = False
    else:
      # This is the first time we encounter pp, so we just add it
      # to the filter for possible later encounters, but don't affect
      # b at all.
      filter[pp] = idx
    idx++
  return b

時間復雜度是O(len(a))這很好，但空間復雜度也是O(len(a)) （對於filter ），所以這可能不是那么好。 根據您的靈活性，您可以使用近似過濾器，例如布隆過濾器。

Answer 4

#-*- coding : utf-8 -*-
a = [[1, 2], [3, 6], [2, 1], [3, 5], [3, 6]]
result = filter(lambda el:(a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1),a)
bool_res = [ (a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1) for el in a]
print result
print bool_res

給出：

[[3, 5]]
[False, False, False, True, False]

Answer 5

將字典用於 O(n) 解決方案。

a = [ [1, 2], [3, 6], [2, 1], [3, 5], [3, 6] ]

dict = {}
boolList = []

# Iterate through a
for i in range (len(a)):

    # Assume that this element is not a duplicate
    # This 'True' is added to the corresponding index i of boolList
    boolList += [True]

    # Set elem to the current pair in the list
    elem = a[i]

    # If elem is in ascending order, it will be entered into the map as is
    if elem[0] <= elem[1]:
        key = repr(elem)
    # If not, change it into ascending order so keys can easily be compared
    else:
        key = repr( [ elem[1] ] + [ elem[0] ])

    # If this pair has not yet been seen, add it as a key to the dictionary
    # with the value a list containing its index in a.
    if key not in dict:
        dict[key] = [i]
    # If this pair is a duploicate, add the new index to the dict. The value
    # of the key will contain a list containing the indeces of that pair in a.
    else:
        # Change the value to contain the new index
        dict[key] += [i]

        # Change boolList for this to True for this index
        boolList[i] = False

        # If this is the first duplicate for the pair, make the first
        # occurrence of the pair into a duplicate as well.
        if len(dict[key]) <= 2:
            boolList[ dict[key][0] ] = False

print a
print boolList

在對列表中查找唯一對

問題描述

5 個解決方案

解決方案1
15 2016-07-04 14:48:59

解決方案2
8 已采納 2016-07-04 19:44:44

解決方案3
2 2016-07-04 14:49:38

解決方案4
2 2016-07-04 14:52:57

解決方案5
-1 2016-07-09 22:38:55

在對列表中查找唯一對

問題描述

5 個解決方案

解決方案1 15 2016-07-04 14:48:59

解決方案2 8 已采納 2016-07-04 19:44:44

解決方案3 2 2016-07-04 14:49:38

解決方案4 2 2016-07-04 14:52:57

解決方案5 -1 2016-07-09 22:38:55

解決方案1
15 2016-07-04 14:48:59

解決方案2
8 已采納 2016-07-04 19:44:44

解決方案3
2 2016-07-04 14:49:38

解決方案4
2 2016-07-04 14:52:57

解決方案5
-1 2016-07-09 22:38:55