python字符串按分隔符拆分所有可能的排列

Question

這可能與Python 3.3: Split string and create all combination等類似問題密切相關，但我無法從中推斷出 Pythonic 解決方案。

問題是：

假設有一個 str ，例如'hi|guys|whats|app' ，我需要用分隔符拆分該 str 的所有排列。 例子：

#splitting only once
['hi','guys|whats|app']
['hi|guys','whats|app']
['hi|guys|whats','app']
#splitting only twice
['hi','guys','whats|app']
['hi','guys|whats','app']
#splitting only three times
...
etc

我可以編寫一個回溯算法，但是 python（例如 itertools）是否提供了一個簡化該算法的庫？

提前致謝！！

Answer 1

一種方法，一旦你拆分了字符串，就是使用itertools.combinations來定義列表中的拆分點，其他位置應該再次融合。

def lst_merge(lst, positions, sep='|'):
    '''merges a list on points other than positions'''
    '''A, B, C, D and 0, 1 -> A, B, C|D'''
    a = -1
    out = []
    for b in list(positions)+[len(lst)-1]:
        out.append('|'.join(lst[a+1:b+1]))
        a = b
    return out

def split_comb(s, split=1, sep='|'):
    from itertools import combinations
    l = s.split(sep)
    return [lst_merge(l, pos, sep=sep)
            for pos in combinations(range(len(l)-1), split)]

例子

>>> split_comb('hi|guys|whats|app', 0)
[['hi|guys|whats|app']]

>>> split_comb('hi|guys|whats|app', 1)
[['hi', 'guys|whats|app'],
 ['hi|guys', 'whats|app'],
 ['hi|guys|whats', 'app']]

>>> split_comb('hi|guys|whats|app', 2)
[['hi', 'guys', 'whats|app'],
 ['hi', 'guys|whats', 'app'],
 ['hi|guys', 'whats', 'app']]

>>> split_comb('hi|guys|whats|app', 3)
[['hi', 'guys', 'whats', 'app']]

>>> split_comb('hi|guys|whats|app', 4)
[] ## impossible

理由

ABCD -> A B C D
         0 1 2

combinations of split points: 0/1 or 0/2 or 1/2

0/1 -> merge on 2 -> A B CD
0/2 -> merge on 1 -> A BC D
1/2 -> merge on 0 -> AB C D

泛型函數

這是一個通用版本，像上面一樣工作，但也將-1作為參數split ，在這種情況下它將輸出所有組合

def lst_merge(lst, positions, sep='|'):
    a = -1
    out = []
    for b in list(positions)+[len(lst)-1]:
        out.append('|'.join(lst[a+1:b+1]))
        a = b
    return out

def split_comb(s, split=1, sep='|'):
    from itertools import combinations, chain
    
    l = s.split(sep)
    
    if split == -1:
        pos = chain.from_iterable(combinations(range(len(l)-1), r)
                                  for r in range(len(l)+1))
    else:
        pos = combinations(range(len(l)-1), split)
        
    return [lst_merge(l, pos, sep=sep)
            for pos in pos]

例子：

>>> split_comb('hi|guys|whats|app', -1)
[['hi|guys|whats|app'],
 ['hi', 'guys|whats|app'],
 ['hi|guys', 'whats|app'],
 ['hi|guys|whats', 'app'],
 ['hi', 'guys', 'whats|app'],
 ['hi', 'guys|whats', 'app'],
 ['hi|guys', 'whats', 'app'],
 ['hi', 'guys', 'whats', 'app']]

Answer 2

一種使用combinations和chain的方法

from itertools import combinations, chain


def partition(alist, indices):
    # https://stackoverflow.com/a/1198876/4001592
    pairs = zip(chain([0], indices), chain(indices, [None]))
    return (alist[i:j] for i, j in pairs)


s = 'hi|guys|whats|app'
delimiter_count = s.count("|")
splits = s.split("|")


for i in range(1, delimiter_count + 1):
    print("split", i)
    for combination in combinations(range(1, delimiter_count + 1), i):
        res = ["|".join(part) for part in partition(splits, combination)]
        print(res)

輸出

split 1
['hi', 'guys|whats|app']
['hi|guys', 'whats|app']
['hi|guys|whats', 'app']
split 2
['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']
split 3
['hi', 'guys', 'whats', 'app']

這個想法是生成所有方法來選擇（或刪除）分隔符 1、2、3 次並從那里生成分區。

Answer 3

這是我想出的遞歸函數：

def splitperms(string, i=0):
    if len(string) == i:
        return [[string]]
    elif string[i] == "|":
        return [*[[string[:i]] + split for split in splitperms(string[i + 1:])], *splitperms(string, i + 1)]
    else:
        return splitperms(string, i + 1)

輸出：

>>> splitperms('hi|guys|whats|app')
[['hi', 'guys', 'whats', 'app'], ['hi', 'guys', 'whats|app'], ['hi', 'guys|whats', 'app'], ['hi', 'guys|whats|app'], ['hi|guys', 'whats', 'app'], ['hi|guys', 'whats|app'], ['hi|guys|whats', 'app'], ['hi|guys|whats|app']]
>>>

Answer 4

您可以找到所有index '|' 然后在所有組合中替換'|' 用','然后拆分基數','如下所示：

>>> from itertools import combinations
>>> st = 'hi|guys|whats|app'
>>> idxs_rep = [idx for idx, s in enumerate(st) if s=='|']

>>> def combs(x):
...    return [c for i in range(len(x)+1) for c in combinations(x,i)]

>>> for idxs in combs(idxs_rep):        
...    lst_st = list(st)
...    for idx in idxs:
...        lst_st[idx] = ','
...    st2 = ''.join(lst_st)
...    print(st2.split(','))

['hi|guys|whats|app']
['hi', 'guys|whats|app']
['hi|guys', 'whats|app']
['hi|guys|whats', 'app']
['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']
['hi', 'guys', 'whats', 'app']

Answer 5

如果您想要所有分區，請嘗試來自more-itertools 的partitions ：

from more_itertools import partitions

s = 'hi|guys|whats|app'

for p in partitions(s.split('|')):
    print(list(map('|'.join, p)))

輸出：

['hi|guys|whats|app']
['hi', 'guys|whats|app']
['hi|guys', 'whats|app']
['hi|guys|whats', 'app']
['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']
['hi', 'guys', 'whats', 'app']

如果您只想要一定數量的拆分，那么不要在所有分隔符處拆分然后重新連接部分，您可以獲取分隔符索引的組合並相應地獲取子字符串：

from itertools import combinations

s = 'hi|guys|whats|app'
splits = 2

indexes = [i for i, c in enumerate(s) if c == '|']
for I in combinations(indexes, splits):
    print([s[i+1:j] for i, j in zip([-1, *I], [*I, None])])

輸出：

['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']

Answer 6

我很驚訝大多數答案都使用了combinations ，這顯然是一個二進制冪序列（即多個二進制笛卡爾積連接）。

讓我詳細說明一下：如果我們有n分隔符，我們就有2**n可能的字符串，其中每個分隔符是on或off 。 因此，如果我們將整數序列的每一位從0映射到2**n到每個分隔符（ 0表示我們不拆分， 1表示我們拆分），我們可以非常有效地生成整個事物（不會遇到堆棧深度限制，並且能夠暫停和恢復生成器 - 甚至並行運行它！ - 只使用一個簡單的整數來跟蹤進度）。

def partition(index, tokens, separator):
    def helper():
        n = index
        for token in tokens:
            yield token
            if n % 2:
                yield separator
            n //= 2
    return ''.join(helper())

def all_partitions(txt, separator):
    tokens = txt.split(separator)
    for i in range(2**(len(tokens)-1)):
        yield partition(i, tokens, separator)

for x in all_partitions('hi|guys|whats|app', '|'):
    print(x)

解釋：

   hi|guys|whats|app
     ^    ^     ^
bit  0    1     2 (big endian representation)

   hi guys whats up
     ^    ^     ^
0 =  0    0     0


   hi|guys whats up
     ^    ^     ^
1 =  1    0     0


   hi guys|whats up
     ^    ^     ^
2 =  0    1     0


   hi|guys|whats up
     ^    ^     ^
3 =  1    1     0


   hi guys whats|up
     ^    ^     ^
4 =  0    0     1


   hi|guys whats|up
     ^    ^     ^
5 =  1    0     1


   hi guys|whats|up
     ^    ^     ^
6 =  0    1     1


   hi|guys|whats|up
     ^    ^     ^
7 =  1    1     1

python字符串按分隔符拆分所有可能的排列

問題描述

5 個解決方案

解決方案1
1 已采納 2021-10-13 12:55:25

例子

理由

泛型函數

解決方案2
0 2021-10-13 12:45:57

解決方案3
0 2021-10-13 12:48:41

解決方案4
0 2021-10-13 13:05:24

解決方案5
0 2021-10-13 17:16:27

解決方案6
0 2022-05-10 16:58:34

python字符串按分隔符拆分所有可能的排列

問題描述

5 個解決方案

解決方案1 1 已采納 2021-10-13 12:55:25

例子

理由

泛型函數

解決方案2 0 2021-10-13 12:45:57

解決方案3 0 2021-10-13 12:48:41

解決方案4 0 2021-10-13 13:05:24

解決方案5 0 2021-10-13 17:16:27

解決方案6 0 2022-05-10 16:58:34

解決方案1
1 已采納 2021-10-13 12:55:25

解決方案2
0 2021-10-13 12:45:57

解決方案3
0 2021-10-13 12:48:41

解決方案4
0 2021-10-13 13:05:24

解決方案5
0 2021-10-13 17:16:27

解決方案6
0 2022-05-10 16:58:34