简体   繁体   English

查找不包含特定字符串的所有组合

[英]Find all combinations that not contains a specific string

Is there any way to find all combinations of an array that not contains specific substrings?有没有办法找到不包含特定子字符串的数组的所有组合? For example a=['a','b','c'] aaa aab aac aba abb ... ccc例如a=['a','b','c'] aaa aab aac aba abb ... ccc

but i don't want the substring ab so aaa aac aca acb ... ccc I use the code bellow but for combination of 20 chars and comb of 13 it takes too much time但我不想要子字符串ab所以aaa aac aca acb ... ccc我使用下面的代码但是对于20字符和13组合的组合需要太多时间

import itertools

lista=[]
def foo(l):
    yield from itertools.product(*([l] * 3)) 

non=["ab"]

for x in foo('abc'):
    x=(''.join(x))
    for j in non:
        if j in x:
            value=1
            break
        else:
            value=0
    if (value==0):
        lista.append(x)

Instead of generating all strings then rejecting the ones containing any forbidden substrings, it is (asymptotically) more efficient to build the strings by backtracking, and reject any partial string which already contains a forbidden substring.与生成所有字符串然后拒绝包含任何禁止子字符串的字符串不同,通过回溯构建字符串并拒绝任何已经包含禁止子字符串的部分字符串(渐近)更有效。 We only need to test if the current partial string ends with any forbidden substring, which is faster than testing if it contains one.我们只需要测试当前部分字符串是否以任何禁止的子字符串结尾,这比测试它是否包含一个要快。

Here's an implementation using a recursive generator function:这是使用递归生成器函数的实现:

def strings_without(alphabet, k, avoid):
    def helper(s):
        if any(s.endswith(t) for t in avoid):
            pass
        elif len(s) == k:
            yield s
        else:
            for c in alphabet:
                yield from helper(s + c)
    return helper('')

Example:例子:

>>> for s in strings_without('abc', 3, ['ab']):
...     print(s)
... 
aaa
aac
aca
acb
acc
baa
bac
bba
bbb
bbc
bca
bcb
bcc
caa
cac
cba
cbb
cbc
cca
ccb
ccc

For strings of length 13 from an alphabet of size 20, this should be a big improvement, but 20 13 is an enormous number.对于大小为 20 的字母表中长度为 13 的字符串,这应该是一个很大的改进,但 20 13是一个巨大的数字。 So unless you are forbidding a lot of substrings, the number of solutions will be very large.因此,除非您禁止大量子字符串,否则解决方案的数量将非常大。 No algorithm can possibly generate h strings of length k in less than O( hk ) time, so the running time for any efficient algorithm will still be output-sensitive .没有算法可以在小于 O( hk ) 的时间内生成长度为k 的h 个字符串,因此任何有效算法的运行时间仍然是输出敏感的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM