简体   繁体   中英

How to find total number of possible combinations for a string?

How to find total number of possible sub sequences for a string that start with a particular character say 'a' and end with a particular character say 'b' from a given string?

EXAMPLE:
for a string 'aabb' if we want to know the count of how many sub sequences are possible if the sub-sequence must start from character 'a' and end with character 'b' then valid sub sequences can be from (ab) contributed by index (0,2), (ab) contributed by index (0,3), (ab) contributed by index (1,2), (ab) contributed by index (1,3), (aab) using index (0,1,2) , (aab) using index (0,1,3) ,(abb) using index (0,2,3),(abb) using index (1,2,3) and aabb itself so total is 9 .I can solve this for a string of small length but how to solve this for a large string where brute force doesn't work

Note:We consider two sub strings to be different if they start or end at different indices of the given string.

def count(str,str1 ,str2 ):
l = len(str) 
count=0
for i in range(0, l+1):
    for j in range(i+1, l+1):
        if str[i] == str1 and str[j-1] == str2:
            count+=1
return count

Before I post my main code I'll try to explain how it works. Let the source string be 'a123b'. The valid subsequences consist of all the subsets of '123' prefixed with 'a' and suffixed with 'b'. The set of all subsets is called the powerset , and the itertools docs have code showing how to produce the powerset using combinations in the Itertools Recipes section.

# Print all subsequences of '123', prefixed with 'a' and suffixed with 'b'
from itertools import combinations

src = '123'
for i in range(len(src) + 1):
    for s in combinations(src, i):
        print('a' + ''.join(s) + 'b')

output

ab
a1b
a2b
a3b
a12b
a13b
a23b
a123b

Here's a brute-force solution which uses that recipe.

from itertools import combinations

def count_bruteforce(src, targets):
    c0, c1 = targets
    count = 0
    for i in range(2, len(src) + 1):
        for t in combinations(src, i):
            if t[0] == c0 and t[-1] == c1:
                count += 1
    return count

It can be easily shown that the number of subsets of a set of n items is 2**n . So rather than producing the subsets one by one we can speed up the process by using that formula, which is what my count_fast function does.

from itertools import combinations

def count_bruteforce(src, targets):
    c0, c1 = targets
    count = 0
    for i in range(2, len(src) + 1):
        for t in combinations(src, i):
            if t[0] == c0 and t[-1] == c1:
                count += 1
    return count

def count_fast(src, targets):
    c0, c1 = targets
    # Find indices of the target chars
    idx = {c: [] for c in targets}
    for i, c in enumerate(src):
        if c in targets:
            idx[c].append(i)

    idx0, idx1 = idx[c0], idx[c1]
    count = 0
    for u in idx0:
        for v in idx1:
            if v < u:
                continue
            # Calculate the number of valid subsequences
            # which start at u+1 and end at v-1. 
            n = v - u - 1
            count += 2 ** n
    return count

# Test

funcs = (
    count_bruteforce,
    count_fast,
)

targets = 'ab'

data = (
    'ab', 'aabb', 'a123b', 'aacbb', 'aabbb', 
    'zababcaabb', 'aabbaaabbb',
)

for src in data:
    print(src)
    for f in funcs:
        print(f.__name__, f(src, targets))
    print()

output

ab
count_bruteforce 1
count_fast 1

aabb
count_bruteforce 9
count_fast 9

a123b
count_bruteforce 8
count_fast 8

aacbb
count_bruteforce 18
count_fast 18

aabbb
count_bruteforce 21
count_fast 21

zababcaabb
count_bruteforce 255
count_fast 255

aabbaaabbb
count_bruteforce 730
count_fast 730

There may be a way to make this even faster by starting the inner loop at the correct place rather than using continue to skip unwanted indices.

Easy, it should just be the number of letters to the power of two. Ie, n^2

Python implementation would just be n_substrings = n ** 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM