如何从不包含重复的列表中查找一组值

Question

You have a list of lists in Python, something like this: 您在Python中有一个列表列表，如下所示：

l = [[ 1,  2,  3],
     [18, 20, 22],
     [ 3, 14, 16],
     [ 1,  3, 05],
     [18,  2, 16]]

How would you go about selecting one value from each sub-list, such that no single value is repeated, and the sum of the resulting list is minimised? 您将如何从每个子列表中选择一个值，这样就不会重复单个值，并使结果列表的总和最小化？

result = [1, 18, 3, 5, 2]

Answer 1

Here's a compact brute-force solution, so it has to perform columns**rows tests, which is not good. 这是一个紧凑的蛮力解决方案，因此它必须执行columns**rows测试，这不是很好。 I suspect that there's a backtracking algorithm that's generally more efficient, but in the worst case all possibilities may need to be checked. 我怀疑有一种回溯算法通常更有效，但是在最坏的情况下，可能需要检查所有可能性。

from itertools import product

lst = [
    [ 1,  2,  3],
    [18, 20, 22],
    [ 3, 14, 16],
    [ 1,  3,  5],
    [18,  2, 16],
]

nrows = len(lst) 
m = min((t for t in product(*lst) if len(set(t)) == nrows), key=sum)
print(m)

output 产量

(1, 18, 3, 5, 2)

Here's a faster version that uses a recursive generator instead of itertools.product . 这是使用递归生成器而不是itertools.product的更快版本。

def select(data, seq):
    if data:
        for seq in select(data[:-1], seq):
            for u in data[-1]:
                if u not in seq:
                    yield seq + [u]
    else:
        yield seq

def solve(data):
    return min(select(data, []), key=sum)

Here's a modified version of the recursive generator that sorts as it goes, but of course that's slower, and it consumes more RAM. 这是递归生成器的修改版本，可以按其进行排序，但是当然速度较慢，并且消耗更多的RAM。 If the input data is sorted it usually finds the minimum solution quite rapidly, but I can't figure out a foolproof way of getting it to stop when it's found the minimum selection. 如果对输入数据进行排序，它通常会很快找到最小解决方案，但是我无法找到一种万无一失的方法，可以在找到最小选择时停止它。

def select(data, selected):
    if data:
        for selected in sorted(select(data[:-1], selected), key=sum):
            for u in data[-1]:
                if u not in selected:
                    yield selected + [u]
    else:
        yield selected

Here's some timing code that compares the speed of Maurice's and my solutions. 这是一些时序代码，用于比较Maurice和我的解决方案的速度。 It runs on Python 2 and Python 3. I get similar time results on Python 2.6 & Python 3.6 on my old 2GHz 32 bit machine running an oldish Debian derivative of Linux. 它可以在Python 2和Python 3上运行。在旧版的运行GHz2的Debian派生Linux的32 GHz机器上，在2.6和2.6上获得的时间相似。

from __future__ import print_function, division
from timeit import Timer
from itertools import product
from random import seed, sample, randrange

n = randrange(0, 1 << 32)
print('seed', n)
seed(n)

def show(data):
    indent = ' ' * 4
    s = '\n'.join(['{0}{1},'.format(indent, row) for row in data])
    print('[\n{0}\n]\n'.format(s))

def make_data(rows, cols):
    maxn = rows * cols
    nums = range(1, maxn)
    return [sample(nums, cols) for _ in range(rows)]

def sort_data(data):
    newdata = [sorted(row) for row in data]
    newdata.sort(reverse=True, key=sum)
    return newdata

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

def solve_Maurice(data):
    result = None
    for item in product(*data):
        if len(item) > len(set(item)):
            # Try the next combination if there are duplicates
            continue
        if result is None or sum(result) > sum(item):
            result = item
    return result

def solve_prodgen(data):
    rows = len(data) 
    return min((t for t in product(*data) if len(set(t)) == rows), key=sum)

def select(data, seq):
    if data:
        for seq in select(data[:-1], seq):
            for u in data[-1]:
                if u not in seq:
                    yield seq + [u]
    else:
        yield seq

def solve_recgen(data):
    return min(select(data, []), key=sum)

funcs = (
    solve_Maurice,
    solve_prodgen,
    solve_recgen,
)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

def verify():
    for func in funcs:
        fname = func.__name__
        seq = func(data)
        print('{0:14} {1}'.format(fname, seq))
    print()

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import data, ' + fname
        cmd = fname + '(data)'
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:14} {1}'.format(fname, result))

rows, cols = 6, 4
print('Number of selections:', cols ** rows)

data = make_data(rows, cols)
data = sort_data(data)
show(data)

verify()

loops, reps = 100, 3
time_test(loops, reps)

typical output 典型输出

seed 22290
Number of selections: 4096
[
    [6, 11, 22, 23],
    [9, 14, 17, 19],
    [5, 9, 16, 22],
    [5, 6, 9, 13],
    [1, 3, 6, 22],
    [4, 5, 6, 13],
]

solve_Maurice  (11, 9, 5, 6, 1, 4)
solve_prodgen  (11, 9, 5, 6, 1, 4)
solve_recgen   [11, 9, 5, 6, 1, 4]

solve_recgen   [0.5476037560001714, 0.549133045002236, 0.5647858490046929]
solve_prodgen  [1.2500368960027117, 1.296529343999282, 1.3022710209988873]
solve_Maurice  [1.485518219997175, 1.489505891004228, 1.784105566002836]

Answer 2

EDIT: My previous solution only works in most cases, this should do the trick in all cases: 编辑：我以前的解决方案仅在大多数情况下有效，这应该在所有情况下都能解决问题：

from itertools import product
l = [[1, 2, 3], [18, 20, 22], [3, 14, 16], [1, 3, 5], [18, 2, 16]]

result = None
for item in product(*l):
    if len(item) > len(set(item)):
        # Try the next combination if there are duplicates
        continue
    if result is None or sum(result) > sum(item):
        result = item
print(result)

Output 输出量

(1, 18, 3, 5, 2)

如何从不包含重复的列表中查找一组值

问题描述

2 个解决方案

解决方案1
2 2016-11-14 09:20:30

解决方案2
1 2016-11-14 08:33:21

如何从不包含重复的列表中查找一组值

问题描述

2 个解决方案

解决方案1 2 2016-11-14 09:20:30

解决方案2 1 2016-11-14 08:33:21

解决方案1
2 2016-11-14 09:20:30

解决方案2
1 2016-11-14 08:33:21