简体   繁体   English

如何从不包含重复的列表中查找一组值

[英]How to find a set of values from a list of lists which contains no repetitions

You have a list of lists in Python, something like this: 您在Python中有一个列表列表,如下所示:

l = [[ 1,  2,  3],
     [18, 20, 22],
     [ 3, 14, 16],
     [ 1,  3, 05],
     [18,  2, 16]]

How would you go about selecting one value from each sub-list, such that no single value is repeated, and the sum of the resulting list is minimised? 您将如何从每个子列表中选择一个值,这样就不会重复单个值,并使结果列表的总和最小化?

result = [1, 18, 3, 5, 2]

Here's a compact brute-force solution, so it has to perform columns**rows tests, which is not good. 这是一个紧凑的蛮力解决方案,因此它必须执行columns**rows测试,这不是很好。 I suspect that there's a backtracking algorithm that's generally more efficient, but in the worst case all possibilities may need to be checked. 我怀疑有一种回溯算法通常更有效,但是在最坏的情况下,可能需要检查所有可能性。

from itertools import product

lst = [
    [ 1,  2,  3],
    [18, 20, 22],
    [ 3, 14, 16],
    [ 1,  3,  5],
    [18,  2, 16],
]

nrows = len(lst) 
m = min((t for t in product(*lst) if len(set(t)) == nrows), key=sum)
print(m)

output 产量

(1, 18, 3, 5, 2)

Here's a faster version that uses a recursive generator instead of itertools.product . 这是使用递归生成器而不是itertools.product的更快版本。

def select(data, seq):
    if data:
        for seq in select(data[:-1], seq):
            for u in data[-1]:
                if u not in seq:
                    yield seq + [u]
    else:
        yield seq

def solve(data):
    return min(select(data, []), key=sum)

Here's a modified version of the recursive generator that sorts as it goes, but of course that's slower, and it consumes more RAM. 这是递归生成器的修改版本,可以按其进行排序,但是当然速度较慢,并且消耗更多的RAM。 If the input data is sorted it usually finds the minimum solution quite rapidly, but I can't figure out a foolproof way of getting it to stop when it's found the minimum selection. 如果对输入数据进行排序,它通常会很快找到最小解决方案,但是我无法找到一种万无一失的方法,可以在找到最小选择时停止它。

def select(data, selected):
    if data:
        for selected in sorted(select(data[:-1], selected), key=sum):
            for u in data[-1]:
                if u not in selected:
                    yield selected + [u]
    else:
        yield selected

Here's some timing code that compares the speed of Maurice's and my solutions. 这是一些时序代码,用于比较Maurice和我的解决方案的速度。 It runs on Python 2 and Python 3. I get similar time results on Python 2.6 & Python 3.6 on my old 2GHz 32 bit machine running an oldish Debian derivative of Linux. 它可以在Python 2和Python 3上运行。在旧版的运行GHz2的Debian派生Linux的32 GHz机器上,在2.6和2.6上获得的时间相似。

from __future__ import print_function, division
from timeit import Timer
from itertools import product
from random import seed, sample, randrange

n = randrange(0, 1 << 32)
print('seed', n)
seed(n)

def show(data):
    indent = ' ' * 4
    s = '\n'.join(['{0}{1},'.format(indent, row) for row in data])
    print('[\n{0}\n]\n'.format(s))

def make_data(rows, cols):
    maxn = rows * cols
    nums = range(1, maxn)
    return [sample(nums, cols) for _ in range(rows)]

def sort_data(data):
    newdata = [sorted(row) for row in data]
    newdata.sort(reverse=True, key=sum)
    return newdata

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

def solve_Maurice(data):
    result = None
    for item in product(*data):
        if len(item) > len(set(item)):
            # Try the next combination if there are duplicates
            continue
        if result is None or sum(result) > sum(item):
            result = item
    return result

def solve_prodgen(data):
    rows = len(data) 
    return min((t for t in product(*data) if len(set(t)) == rows), key=sum)

def select(data, seq):
    if data:
        for seq in select(data[:-1], seq):
            for u in data[-1]:
                if u not in seq:
                    yield seq + [u]
    else:
        yield seq

def solve_recgen(data):
    return min(select(data, []), key=sum)

funcs = (
    solve_Maurice,
    solve_prodgen,
    solve_recgen,
)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

def verify():
    for func in funcs:
        fname = func.__name__
        seq = func(data)
        print('{0:14} {1}'.format(fname, seq))
    print()

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import data, ' + fname
        cmd = fname + '(data)'
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:14} {1}'.format(fname, result))

rows, cols = 6, 4
print('Number of selections:', cols ** rows)

data = make_data(rows, cols)
data = sort_data(data)
show(data)

verify()

loops, reps = 100, 3
time_test(loops, reps)

typical output 典型输出

seed 22290
Number of selections: 4096
[
    [6, 11, 22, 23],
    [9, 14, 17, 19],
    [5, 9, 16, 22],
    [5, 6, 9, 13],
    [1, 3, 6, 22],
    [4, 5, 6, 13],
]

solve_Maurice  (11, 9, 5, 6, 1, 4)
solve_prodgen  (11, 9, 5, 6, 1, 4)
solve_recgen   [11, 9, 5, 6, 1, 4]

solve_recgen   [0.5476037560001714, 0.549133045002236, 0.5647858490046929]
solve_prodgen  [1.2500368960027117, 1.296529343999282, 1.3022710209988873]
solve_Maurice  [1.485518219997175, 1.489505891004228, 1.784105566002836]

EDIT: My previous solution only works in most cases, this should do the trick in all cases: 编辑:我以前的解决方案仅在大多数情况下有效,这应该在所有情况下都能解决问题:

from itertools import product
l = [[1, 2, 3], [18, 20, 22], [3, 14, 16], [1, 3, 5], [18, 2, 16]]

result = None
for item in product(*l):
    if len(item) > len(set(item)):
        # Try the next combination if there are duplicates
        continue
    if result is None or sum(result) > sum(item):
        result = item
print(result)

Output 输出量

(1, 18, 3, 5, 2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在列表中查找重复项? - How to find repetitions in lists? 如何从包含Python对象的列表中查找最大值? - How to find maximum values from a list which contains objects in Python? 如何从列表列表中获取包含特定元素的列表 - How to get list which contains certain element from lists of lists 如何从包含超过 4 个值的列表集创建矩阵? - How to create matrix from set of lists which contains more than 4 values? 如何在字典中查找仅包含列表中所有值的所有键 - How to find all keys within a dictionary which contains only ALL values from a list CSV:如何从列表列表(包含列表的列表)中找到最接近的匹配/最接近的值? - CSV: How to find a closest match/closest value from the list of lists (list that contains lists)? 在列表python中查找包含0-23中所有值的列表 - Find lists which together contain all values from 0-23 in list of lists python 如何在 django 中显示包含列表作为值的字典中的值 - how to display values from dictionary which contains list as value in django 如何从包含另一个列表中的关键字的嵌套列表中找到所有列表 - How can i find all the lists from a nested list that contains a keyword from another list 如何从嵌套列表中找到包含较高值的列表并返回这些列表? - How to find the list that contains the higher value from nested list and return those lists?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM