[英]Find unique pairs in list of pairs
我有一個(大)整數列表列表,例如,
a = [
[1, 2],
[3, 6],
[2, 1],
[3, 5],
[3, 6]
]
大多數對會出現兩次,其中整數的順序無關緊要(即[1, 2]
等價於[2, 1]
)。 我現在想找到只出現一次的對,並獲得一個指示該值的布爾列表。 對於上面的例子,
b = [False, False, False, True, False]
由於a
通常很大,我想避免顯式循環。 可能建議映射到frozenset
s,但我不確定這是否太過分了。
ctr = Counter(frozenset(x) for x in a)
b = [ctr[frozenset(x)] == 1 for x in a]
我們可以使用 Counter 來獲取每個列表的計數(將列表轉為frozenset 以忽略順序),然后檢查每個列表是否只出現一次。
這是一個使用 NumPy 的解決方案,它比建議的frozenset
解決方案快 10 倍:
a = numpy.array(a)
a.sort(axis=1)
b = numpy.ascontiguousarray(a).view(
numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
)
_, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
print(ct[inv] == 1)
排序很快,並確保原始數組中的邊[i, j]
, [j, i]
彼此識別。 比frozenset
s或tuple
s快得多。
不同數組大小的速度比較:
該情節是用
from collections import Counter
import numpy
import perfplot
def fs(a):
ctr = Counter(frozenset(x) for x in a)
b = [ctr[frozenset(x)] == 1 for x in a]
return b
def with_numpy(a):
a = numpy.array(a)
a.sort(axis=1)
b = numpy.ascontiguousarray(a).view(
numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
)
_, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
res = ct[inv] == 1
return res
perfplot.save(
"out.png",
setup=lambda n: numpy.random.randint(0, 10, size=(n, 2)),
kernels=[fs, with_numpy],
labels=["frozenset", "numpy"],
n_range=[2 ** k for k in range(15)],
xlabel="len(a)",
)
您可以從頭到尾掃描列表,同時將遇到的配對map
到它們的第一個位置。 每當您處理一對時,您都會檢查之前是否遇到過它。 如果是這種情況,則 b 中第一次遭遇的索引和當前遭遇的索引都必須設置為 False。 否則,我們只需將當前索引添加到遇到的對的映射中,而不會更改 b。 b 最初將開始所有True
。 為了保持相同的[1,2]
和[2,1]
,我首先簡單地對這對進行排序,以獲得穩定的表示。 代碼看起來像這樣:
def proc(a):
b = [True] * len(a) # Better way to allocate this
filter = {}
idx = 0
for p in a:
m = min(p)
M = max(p)
pp = (m, M)
if pp in filter:
# We've found the element once previously
# Need to mark both it and the current value as "False"
# If we encounter pp multiple times, we'll set the initial
# value to False multiple times, but that's not an issue
b[filter[pp]] = False
b[idx] = False
else:
# This is the first time we encounter pp, so we just add it
# to the filter for possible later encounters, but don't affect
# b at all.
filter[pp] = idx
idx++
return b
時間復雜度是O(len(a))
這很好,但空間復雜度也是O(len(a))
(對於filter
),所以這可能不是那么好。 根據您的靈活性,您可以使用近似過濾器,例如布隆過濾器。
#-*- coding : utf-8 -*-
a = [[1, 2], [3, 6], [2, 1], [3, 5], [3, 6]]
result = filter(lambda el:(a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1),a)
bool_res = [ (a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1) for el in a]
print result
print bool_res
給出:
[[3, 5]]
[False, False, False, True, False]
將字典用於 O(n) 解決方案。
a = [ [1, 2], [3, 6], [2, 1], [3, 5], [3, 6] ]
dict = {}
boolList = []
# Iterate through a
for i in range (len(a)):
# Assume that this element is not a duplicate
# This 'True' is added to the corresponding index i of boolList
boolList += [True]
# Set elem to the current pair in the list
elem = a[i]
# If elem is in ascending order, it will be entered into the map as is
if elem[0] <= elem[1]:
key = repr(elem)
# If not, change it into ascending order so keys can easily be compared
else:
key = repr( [ elem[1] ] + [ elem[0] ])
# If this pair has not yet been seen, add it as a key to the dictionary
# with the value a list containing its index in a.
if key not in dict:
dict[key] = [i]
# If this pair is a duploicate, add the new index to the dict. The value
# of the key will contain a list containing the indeces of that pair in a.
else:
# Change the value to contain the new index
dict[key] += [i]
# Change boolList for this to True for this index
boolList[i] = False
# If this is the first duplicate for the pair, make the first
# occurrence of the pair into a duplicate as well.
if len(dict[key]) <= 2:
boolList[ dict[key][0] ] = False
print a
print boolList
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.