简体   繁体   English

如何找到一个字符串的可能组合总数?

[英]How to find total number of possible combinations for a string?

How to find total number of possible sub sequences for a string that start with a particular character say 'a' and end with a particular character say 'b' from a given string?如何从给定的字符串中找到以特定字符“a”开头并以特定字符'b'结尾的字符串的可能子序列总数?

EXAMPLE:例子:
for a string 'aabb' if we want to know the count of how many sub sequences are possible if the sub-sequence must start from character 'a' and end with character 'b' then valid sub sequences can be from (ab) contributed by index (0,2), (ab) contributed by index (0,3), (ab) contributed by index (1,2), (ab) contributed by index (1,3), (aab) using index (0,1,2) , (aab) using index (0,1,3) ,(abb) using index (0,2,3),(abb) using index (1,2,3) and aabb itself so total is 9 .I can solve this for a string of small length but how to solve this for a large string where brute force doesn't work对于字符串'aabb'如果我们想知道可能有多少子序列的计数,如果子序列必须从字符'a'并以字符'b'结尾,那么有效的子序列可以来自(ab)贡献由索引(0,2), (ab)贡献(0,2), (ab)由索引(0,3), (ab)贡献(0,3), (ab)由索引(1,2), (ab)贡献(1,2), (ab)由索引(1,3), (aab)贡献(1,3), (aab)使用索引(0,1,2) , (aab)使用索引(0,1,3) ,(abb) (0,2,3),(abb)使用索引(1,2,3) (0,2,3),(abb)使用索引(1,2,3)aabb本身所以总计是 9 。我可以为一个小长度的字符串解决这个问题,但是如何为一个蛮力不起作用的大字符串解决这个问题

Note:We consider two sub strings to be different if they start or end at different indices of the given string.注意:如果两个子字符串在给定字符串的不同索引处开始或结束,我们认为它们是不同的。

def count(str,str1 ,str2 ):
l = len(str) 
count=0
for i in range(0, l+1):
    for j in range(i+1, l+1):
        if str[i] == str1 and str[j-1] == str2:
            count+=1
return count

Before I post my main code I'll try to explain how it works.在我发布我的主要代码之前,我将尝试解释它是如何工作的。 Let the source string be 'a123b'.让源字符串为'a123b'。 The valid subsequences consist of all the subsets of '123' prefixed with 'a' and suffixed with 'b'.有效的子序列由以“a”为前缀并以“b”为后缀的“123”的所有子集组成。 The set of all subsets is called the powerset , and the itertools docs have code showing how to produce the powerset using combinations in the Itertools Recipes section.所有子集的集合称为powersetitertools文档的代码显示了如何使用Itertools Recipes部分中的combinations来生成 powerset。

# Print all subsequences of '123', prefixed with 'a' and suffixed with 'b'
from itertools import combinations

src = '123'
for i in range(len(src) + 1):
    for s in combinations(src, i):
        print('a' + ''.join(s) + 'b')

output输出

ab
a1b
a2b
a3b
a12b
a13b
a23b
a123b

Here's a brute-force solution which uses that recipe.这是使用该配方的蛮力解决方案。

from itertools import combinations

def count_bruteforce(src, targets):
    c0, c1 = targets
    count = 0
    for i in range(2, len(src) + 1):
        for t in combinations(src, i):
            if t[0] == c0 and t[-1] == c1:
                count += 1
    return count

It can be easily shown that the number of subsets of a set of n items is 2**n .可以很容易地证明一组n项的子集数是2**n So rather than producing the subsets one by one we can speed up the process by using that formula, which is what my count_fast function does.因此,与其一一生成子集,不如通过使用该公式来加速该过程,这正是我的count_fast函数所做的。

from itertools import combinations

def count_bruteforce(src, targets):
    c0, c1 = targets
    count = 0
    for i in range(2, len(src) + 1):
        for t in combinations(src, i):
            if t[0] == c0 and t[-1] == c1:
                count += 1
    return count

def count_fast(src, targets):
    c0, c1 = targets
    # Find indices of the target chars
    idx = {c: [] for c in targets}
    for i, c in enumerate(src):
        if c in targets:
            idx[c].append(i)

    idx0, idx1 = idx[c0], idx[c1]
    count = 0
    for u in idx0:
        for v in idx1:
            if v < u:
                continue
            # Calculate the number of valid subsequences
            # which start at u+1 and end at v-1. 
            n = v - u - 1
            count += 2 ** n
    return count

# Test

funcs = (
    count_bruteforce,
    count_fast,
)

targets = 'ab'

data = (
    'ab', 'aabb', 'a123b', 'aacbb', 'aabbb', 
    'zababcaabb', 'aabbaaabbb',
)

for src in data:
    print(src)
    for f in funcs:
        print(f.__name__, f(src, targets))
    print()

output输出

ab
count_bruteforce 1
count_fast 1

aabb
count_bruteforce 9
count_fast 9

a123b
count_bruteforce 8
count_fast 8

aacbb
count_bruteforce 18
count_fast 18

aabbb
count_bruteforce 21
count_fast 21

zababcaabb
count_bruteforce 255
count_fast 255

aabbaaabbb
count_bruteforce 730
count_fast 730

There may be a way to make this even faster by starting the inner loop at the correct place rather than using continue to skip unwanted indices.可能有一种方法可以通过在正确的位置启动内部循环而不是使用continue跳过不需要的索引来加快速度。

Easy, it should just be the number of letters to the power of two.很简单,它应该只是字母的 2 次方数。 Ie, n^2即, n^2

Python implementation would just be n_substrings = n ** 2 Python 实现只是n_substrings = n ** 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM