Python - 構建子列表，滿足來自大量組合的特定條件

Question

讀了很久，我第一次找不到我正在做的事情的答案。

我有一個包含93個字符串的列表，每個字符串長度為6個字符。 從這93個字符串中，我想要識別一組20個，它們都滿足相對於集合中其他條件的特定標准。 雖然itertools.combinations將為我提供所有可能的組合，但並非所有條件都值得檢查。

例如，如果[list [0]，list [1]等]失敗，因為list [0]和list [1]不能在一起，那么其他18個字符串是什么並不重要，每次都會失敗，這是一大堆浪費的檢查。

目前我有20個嵌套for循環，但似乎必須有一個更好/更快的方法來做到這一點：

for n1 in bclist:
    building = [n1]
    n2bclist = [bc for bc in bclist if bc not in building]
    for n2 in n2bclist:              #this is the start of what gets repeated 19 times
        building.append(n2)
        if test_function(building): #does set fail? (counter intuitive, True when fail, False when pass)
            building.remove(n2)
            continue
        n3bclist = [bc for bc in bclist if bc not in building]
        #insert the additional 19 for loops, with n3 in n3, n4 in n4, etc
        building.remove(n2)

在20日的循環中有打印語句，如果一組20甚至存在，則提醒我。 for語句至少允許我在單個添加失敗時提前跳過集合，但是當更大的組合失敗時沒有記憶：

例如[list[0], list[1]]失敗，所以跳到[list[0], [list[2]]傳遞。 接下來是[list[0], list[2], list[1]] ，這將失敗，因為0和1再次在一起所以它將移動到[list[0], list[2], list[3]]可能或不會通過。 我擔心的是最終還會測試：

[list[0], list[3], list[2]]
[list[2], list[0], list[3]]
[list[2], list[3], list[0]]
[list[3], list[0], list[2]]
[list[3], list[2], list[0]]

所有這些組合將具有與先前組合相同的結果。 基本上我交易了itertools.combinations的惡魔，測試我知道失敗的所有集合的組合，因為早期的值對於for循環的惡魔而言失敗，當我不關心它們的順序時，它將值的順序視為一個因素。 這兩種方法都會顯着增加我的代碼完成所需的時間。

關於如何擺脫魔鬼的任何想法將不勝感激。

Answer 1

使用您當前的方法，但也要跟蹤索引，以便在內部循環中可以跳過您已經檢查過的元素：

bcenum = list(enumerate(bclist))
for i1, n1 in bcenum:
    building = [n1]
    for i2, n2 in bcenum[i1+1:]:              #this is the start of what gets repeated 19 times
        building.append(n2)
        if test_function(building): #does set fail? (counter intuitive, True when fail, False when pass)
            building.remove(n2)
            continue
        for i3, n3 in bcenum[i2+1:]:
            # more nested loops
        building.remove(n2)

Answer 2

def gen(l, n, test, prefix=()):
  if n == 0:
    yield prefix
  else:
    for i, el in enumerate(l):
      if not test(prefix + (el,)):
        for sub in gen(l[i+1:], n - 1, test, prefix + (el,)):
          yield sub

def test(l):
  return sum(l) % 3 == 0 # just a random example for testing

print list(gen(range(5), 3, test))

這將從l選擇基數n子集，使得test(subset) == False 。

它試圖避免不必要的工作。 但是，考慮到有100種方法可以選擇93種中的20種元素，您可能需要重新考慮整體方法。

Answer 3

您可以利用問題的兩個方面：

訂單沒關系
如果test_function(L)是True則test_function的任何子列表的L也將是True

你也可以通過處理索引0-92而不是list[0] - list[92]來簡化一些事情 - 它只在test_function ，我們可能會關心列表的內容是什么。

下面的代碼通過首先找到可行對，然后是四組，八組和十六組來完成。 最后，它找到了16和4的所有可行組合，以獲得20個列表。然而，有超過100,000套8個，所以它仍然太慢，我放棄了。 可能你可以沿着相同的路線做一些事情但是用itertools加速它，但可能還不夠。

target = range(5, 25)
def test_function(L):
    for i in L:
        if not i in target:
            return True
def possible_combos(A, B):
    """
    Find all possible pairings of a list within A and a list within B
    """
    R = []
    for i in A:
        for j in B:
            if i[-1] < j[0] and not test_function(i + j):
                R.append(i + j)
    return R
def possible_doubles(A):
    """
    Find all possible pairings of two lists within A
    """
    R = []
    for n, i in enumerate(A):
        for j in A[n + 1:]:
            if i[-1] < j[0] and not test_function(i + j):
                R.append(i + j)
    return R
# First, find all pairs that are okay
L = range(92) 
pairs = []
for i in L:
    for j in L[i + 1:]:
        if not test_function([i, j]):
            pairs.append([i, j])

# Then, all pairs of pairs
quads = possible_doubles(pairs)
print "fours", len(quads), quads[0]
# Then all sets of eight, and sixteen
eights = possible_doubles(quads)
print "eights", len(eights), eights[0]
sixteens = possible_doubles(eights)
print "sixteens", len(sixteens), sixteens[0]

# Finally check all possible combinations of a sixteen plus a four
possible_solutions = possible_combos(sixteens, fours)
print len(possible_solutions), possible_solutions[0]

編輯：我發現了一個更好的解決方案。 首先，識別符合test_function的范圍（0-92）內的所有值對，保持對按順序排列。 據推測，第一對的第一個值必須是解的第一個值，最后一對的第二個值必須是解的最后一個值（但是檢查......對於test_function這個假設是正確的嗎？如果這不是'這是一個安全的假設，那么你需要為開始和結束的所有可能值重復find_paths 。 然后找到從第1個值到最后一個值的路徑，該值為20個值，並且也符合test_function 。

def test_function(S):
    for i in S:
        if not i in target:
            return True
    return False

def find_paths(p, f):
    """ Find paths from end of p to f, check they are the right length,
        and check they conform to test_function
    """
    successful = []
    if p[-1] in pairs_dict:
        for n in pairs_dict[p[-1]]:
            p2 = p + [n]
            if (n == f and len(p2) == target_length and
                not test_function(p2)):
                successful.append(p2)
            else:
                successful += find_paths(p2, f)
    return successful

list_length = 93              # this is the number of possible elements
target = [i * 2 for i in range(5, 25)] 
    # ^ this is the unknown target list we're aiming for...
target_length = len(target)   # ... we only know its length
L = range(list_length - 1)
pairs = []
for i in L:
    for j in L[i + 1:]:
        if not test_function([i, j]):
            pairs.append([i, j])
firsts = [a for a, b in pairs]
nexts = [[b for a, b in pairs if a == f] for f in firsts]
pairs_dict = dict(zip(firsts, nexts))
print "Found solution(s):", find_paths([pairs[0][0]], pairs[-1][1])

Answer 4

您應該將您的解決方案基於itertools.combinations因為這將解決訂購問題; 短路濾波相對容易解決。

遞歸解決方案

讓我們快速回顧一下如何實現combinations工作; 最簡單的方法是采用嵌套循環方法並將其轉換為遞歸樣式：

def combinations(iterable, r):
    pool = tuple(iterable)
    for i in range(0, len(pool)):
        for j in range(i + 1, len(pool)):
            ...
                yield (i, j, ...)

轉換為遞歸形式：

def combinations(iterable, r):
    pool = tuple(iterable)
    def inner(start, k, acc):
        if k == r:
            yield acc
        else:
            for i in range(start, len(pool)):
                for t in inner(i + 1, k + 1, acc + (pool[i], )):
                    yield t
    return inner(0, 0, ())

現在應用過濾器很簡單：

def combinations_filterfalse(predicate, iterable, r):
    pool = tuple(iterable)
    def inner(start, k, acc):
        if predicate(acc):
            return
        elif k == r:
            yield acc
        else:
            for i in range(start, len(pool)):
                for t in inner(i + 1, k + 1, acc + (pool[i], )):
                    yield t
    return inner(0, 0, ())

我們來檢查一下：

>>> list(combinations_filterfalse(lambda t: sum(t) % 2 == 1, range(5), 2))
[(0, 2), (0, 4), (2, 4)]

迭代解決方案

文檔中列出的itertools.combinations的實際實現使用迭代循環：

def combinations(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = range(r)
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

為了優雅地適應謂詞，有必要稍微重新排序循環：

def combinations_filterfalse(predicate, iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    if r > n or predicate(()):
        return
    elif r == 0:
        yield ()
        return
    indices, i = range(r), 0
    while True:
        while indices[i] + r <= i + n:
            t = tuple(pool[k] for k in indices[:i+1])
            if predicate(t):
                indices[i] += 1
            elif len(t) == r:
                yield t
                indices[i] += 1
            else:
                indices[i+1] = indices[i] + 1
                i += 1
        if i == 0:
            return
        i -= 1
        indices[i] += 1

再次檢查：

>>> list(combinations_filterfalse(lambda t: sum(t) % 2 == 1, range(5), 2))
[(0, 2), (0, 4), (2, 4)]
>>> list(combinations_filterfalse(lambda t: t == (1, 4), range(5), 2))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (2, 3), (2, 4), (3, 4)]
>>> list(combinations_filterfalse(lambda t: t[-1] == 3, range(5), 2))
[(0, 1), (0, 2), (0, 4), (1, 2), (1, 4), (2, 4)]
>>> list(combinations_filterfalse(lambda t: False, range(5), 2))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
>>> list(combinations_filterfalse(lambda t: False, range(5), 0))
[()]

對照

事實證明，遞歸解決方案不僅更簡單，而且更快：

In [33]: timeit list(combinations_filterfalse_rec(lambda t: False, range(20), 5))
10 loops, best of 3: 24.6 ms per loop

In [34]: timeit list(combinations_filterfalse_it(lambda t: False, range(20), 5))
10 loops, best of 3: 76.6 ms per loop

Python - 構建子列表，滿足來自大量組合的特定條件

問題描述

4 個解決方案

解決方案1
1 已采納 2012-12-11 19:08:52

解決方案2
1 2012-12-11 19:12:21

解決方案3
0 2012-12-11 22:00:51

解決方案4
0 2012-12-12 08:50:10

遞歸解決方案

迭代解決方案

對照

Python - 構建子列表，滿足來自大量組合的特定條件

問題描述

4 個解決方案

解決方案1 1 已采納 2012-12-11 19:08:52

解決方案2 1 2012-12-11 19:12:21

解決方案3 0 2012-12-11 22:00:51

解決方案4 0 2012-12-12 08:50:10

遞歸解決方案

迭代解決方案

對照

解決方案1
1 已采納 2012-12-11 19:08:52

解決方案2
1 2012-12-11 19:12:21

解決方案3
0 2012-12-11 22:00:51

解決方案4
0 2012-12-12 08:50:10