[英]Increasing number of permutations of all possible combinations of a list with repetitions allowed
I am trying without success to understand how to use itertools to generate a list with all possible combinations of the elements of a list, with an increasing size of elements to pick and including repetitions.我试图了解如何使用 itertools 生成一个列表,其中包含列表元素的所有可能组合,但要选择和包括重复的元素的大小会增加,但没有成功。 I would like to add also a separator:我还想添加一个分隔符:
lis = ['a','b','c']
separator = '/'
total_number_of_combinations = 3
permutation_list = ['a','b','c', 'a/a', 'a/b', 'a/c', 'b/a', 'b/b', 'b/c', 'c/a', 'c/b', 'c/c',
'a/a/a', 'a/a/b', 'a/a/c', 'a/b/a', 'a/b/b', 'a/b/c', 'a/c/a', 'a/c/b', 'a/c/c'
'b/a/a', 'b/a/b', 'b/a/c', 'b/b/a', 'b/b/b', 'b/b/c', 'b/c/a', 'b/c/b', 'b/c/c'
'c/a/a', 'c/a/b', 'c/a/c', 'c/b/a', 'c/b/b', 'c/b/c', 'c/c/a', 'c/c/b', 'c/c/c']
The list will have then len(lis)+len(lis)**2+len(lis)**3+...++len(lis)**n
elements, with n=total_number_of_combinations
.该列表将具有len(lis)+len(lis)**2+len(lis)**3+...++len(lis)**n
元素,其中n=total_number_of_combinations
。 I need to keep the separator and the total_numbers_of_combinations changeables.我需要保持分隔符和 total_numbers_of_combinations 可变。
I need this in a list that can be check as a condition for filtering a pandas DataFrame (i will check dt[dt.my_col.isin(permutation_list)]
)我需要在一个列表中将其作为过滤 pandas DataFrame 的条件进行检查(我将检查dt[dt.my_col.isin(permutation_list)]
)
I appreciate any help or pointing to a duplicated topic or even an explanation of how to correctly state this problem, because I did not found any topic that answer this question (maybe I am using the wrong keywords...).我感谢任何帮助或指向重复的主题,甚至是关于如何正确 state 这个问题的解释,因为我没有找到任何回答这个问题的主题(也许我使用了错误的关键字......)。 Maybe also there is a function from another module that does that, but I don't know.也许还有来自另一个模块的 function 这样做,但我不知道。
UPDATE: Following the request of @Scott, here is my real case:更新:根据@Scott 的要求,这是我的真实案例:
lis = ['BRUTELE','COCKPIT EST', 'CIRCET']
separator = ' / '
total_number_of_combinations = 10
so my final list need to have 88572 elements.所以我的最终列表需要有 88572 个元素。
Are you looking for this?你在找这个吗?
[ '/'.join([*a]) for i in range(1,3) for a in itertools.combinations_with_replacement('ABCDEF', i) ]
Result is:结果是:
['A', 'B', 'C', 'D', 'E', 'F',
'A/A', 'A/B', 'A/C', 'A/D', 'A/E', 'A/F', 'B/B', 'B/C', 'B/D', 'B/E', 'B/F', 'C/C', 'C/D', 'C/E', 'C/F', 'D/D', 'D/E', 'D/F', 'E/E', 'E/F', 'F/F']
The following will give you 39 entries:以下将为您提供 39 个条目:
['/'.join([*a]) for i in range(1,4) for a in itertools.product(['a','b','c'],repeat=i)]
Please notice, that your given reference only has 35 entries, since {'a/c/c', 'b/a/a', 'b/c/c', 'c/a/a'}
are missing.请注意,您给定的参考文献只有 35 个条目,因为缺少{'a/c/c', 'b/a/a', 'b/c/c', 'c/a/a'}
。 I see no logic why these are missing, so I assume an error with your list.我看不出为什么这些丢失的逻辑,所以我假设你的列表有错误。
What you are trying to get is a product, not a combination.你想要的是一个产品,而不是一个组合。
lis = ["BRUTELE", "COCKPIT EST", "CIRCET"]
separator = " / "
total_number_of_combinations = 3
result = list(
itertools.chain.from_iterable(
(separator.join(a) for a in itertools.product(lis, repeat=i))
for i in range(1, total_number_of_combinations + 1)
)
)
assert result == ['BRUTELE', 'COCKPIT EST', 'CIRCET', 'BRUTELE / BRUTELE', 'BRUTELE / COCKPIT EST', 'BRUTELE / CIRCET', 'COCKPIT EST / BRUTELE', 'COCKPIT EST / COCKPIT EST', 'COCKPIT EST / CIRCET', 'CIRCET / BRUTELE', 'CIRCET / COCKPIT EST', 'CIRCET / CIRCET', 'BRUTELE / BRUTELE / BRUTELE', 'BRUTELE / BRUTELE / COCKPIT EST', 'BRUTELE / BRUTELE / CIRCET', 'BRUTELE / COCKPIT EST / BRUTELE', 'BRUTELE / COCKPIT EST / COCKPIT EST', 'BRUTELE / COCKPIT EST / CIRCET', 'BRUTELE / CIRCET / BRUTELE', 'BRUTELE / CIRCET / COCKPIT EST', 'BRUTELE / CIRCET / CIRCET', 'COCKPIT EST / BRUTELE / BRUTELE', 'COCKPIT EST / BRUTELE / COCKPIT EST', 'COCKPIT EST / BRUTELE / CIRCET', 'COCKPIT EST / COCKPIT EST / BRUTELE', 'COCKPIT EST / COCKPIT EST / COCKPIT EST', 'COCKPIT EST / COCKPIT EST / CIRCET', 'COCKPIT EST / CIRCET / BRUTELE', 'COCKPIT EST / CIRCET / COCKPIT EST', 'COCKPIT EST / CIRCET / CIRCET', 'CIRCET / BRUTELE / BRUTELE', 'CIRCET / BRUTELE / COCKPIT EST', 'CIRCET / BRUTELE / CIRCET', 'CIRCET / COCKPIT EST / BRUTELE', 'CIRCET / COCKPIT EST / COCKPIT EST', 'CIRCET / COCKPIT EST / CIRCET', 'CIRCET / CIRCET / BRUTELE', 'CIRCET / CIRCET / COCKPIT EST', 'CIRCET / CIRCET / CIRCET']
However, the number of resulting items is len(lis) ** total_number_of_combinations
, which may be computationally expensive.但是,结果项目的数量是len(lis) ** total_number_of_combinations
,这可能在计算上很昂贵。 A better method would be parsing a string by splitting at separators and testing the membership of each split string.更好的方法是通过在分隔符处拆分并测试每个拆分字符串的成员资格来解析字符串。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.