简体   繁体   English

查找在Python中拆分字符串的所有列表排列

[英]Find all list permutations of splitting a string in Python

I have a string of letters that I'd like to split into all possible combinations (the order of letters must be remain fixed), so that: 我有一个字母字符串,我想将其分成所有可能的组合(字母顺序必须保持固定),以便:

s = 'monkey'

becomes: 变成:

combinations = [['m', 'onkey'], ['mo', 'nkey'], ['m', 'o', 'nkey'] ... etc]

Any ideas? 有任何想法吗?

http://wordaligned.org/articles/partitioning-with-python contains an interesting post about sequence partitioning, here is the implementation they use: http://wordaligned.org/articles/partitioning-with-python包含有关序列分区的有趣文章,这是它们使用的实现:

#!/usr/bin/env python

# From http://wordaligned.org/articles/partitioning-with-python

from itertools import chain, combinations

def sliceable(xs):
    '''Return a sliceable version of the iterable xs.'''
    try:
        xs[:0]
        return xs
    except TypeError:
        return tuple(xs)

def partition(iterable):
    s = sliceable(iterable)
    n = len(s)
    b, mid, e = [0], list(range(1, n)), [n]
    getslice = s.__getitem__
    splits = (d for i in range(n) for d in combinations(mid, i))
    return [[s[sl] for sl in map(slice, chain(b, d), chain(d, e))]
            for d in splits]

if __name__ == '__main__':
    s = "monkey"
    for i in partition(s):
        print i

Which would print: 哪个会打印:

['monkey']
['m', 'onkey']
['mo', 'nkey']
['mon', 'key']
['monk', 'ey']
['monke', 'y']
['m', 'o', 'nkey']
['m', 'on', 'key']
['m', 'onk', 'ey']
['m', 'onke', 'y']
['mo', 'n', 'key']
['mo', 'nk', 'ey']
['mo', 'nke', 'y']
['mon', 'k', 'ey']
['mon', 'ke', 'y']
['monk', 'e', 'y']
...
['mo', 'n', 'k', 'e', 'y']
['m', 'o', 'n', 'k', 'e', 'y']
def splitter(str):
    for i in range(1, len(str)):
        start = str[0:i]
        end = str[i:]
        yield (start, end)
        for split in splitter(end):
            result = [start]
            result.extend(split)
            yield result

combinations = list(splitter(str))

Note that I defaulted to a generator to save you from running out of memory with long strings. 请注意,我默认使用生成器来避免长字符串耗尽内存。

The idea is to realize that the permutation of a string s is equal to a set containing s itself, and a set union of each substring X of s with the permutation of s\\X . 这个想法是要认识到的字符串的置换 s等于包含一组s本身,并且每个子串的一组联合Xs带的排列s\\X For example, permute('key') : 例如, permute('key')

  1. {'key'} # 'key' itself
  2. {'k', 'ey'} # substring 'k' union 1st permutation of 'ey' = {'e, 'y'}
  3. {'k', 'e', 'y'} # substring 'k' union 2nd permutation of 'ey' = {'ey'}
  4. {'ke', 'y'} # substring 'ke' union 1st and only permutation of 'y' = {'y'}
  5. Union of 1, 2, 3, and 4, yield all permutations of the string key . 1、2、3和4的并集产生字符串key所有排列。

With this in mind, a simple algorithm can be implemented: 考虑到这一点,可以实现一个简单的算法:

>>> def permute(s):
    result = [[s]]
    for i in range(1, len(s)):
        first = [s[:i]]
        rest = s[i:]
        for p in permute(rest):
            result.append(first + p)
    return result

>>> for p in permute('monkey'):
        print(p)    

['monkey']
['m', 'onkey']
['m', 'o', 'nkey']
['m', 'o', 'n', 'key']
['m', 'o', 'n', 'k', 'ey']
['m', 'o', 'n', 'k', 'e', 'y']
['m', 'o', 'n', 'ke', 'y']
['m', 'o', 'nk', 'ey']
['m', 'o', 'nk', 'e', 'y']
['m', 'o', 'nke', 'y']
['m', 'on', 'key']
['m', 'on', 'k', 'ey']
['m', 'on', 'k', 'e', 'y']
['m', 'on', 'ke', 'y']
['m', 'onk', 'ey']
['m', 'onk', 'e', 'y']
['m', 'onke', 'y']
['mo', 'nkey']
['mo', 'n', 'key']
['mo', 'n', 'k', 'ey']
['mo', 'n', 'k', 'e', 'y']
['mo', 'n', 'ke', 'y']
['mo', 'nk', 'ey']
['mo', 'nk', 'e', 'y']
['mo', 'nke', 'y']
['mon', 'key']
['mon', 'k', 'ey']
['mon', 'k', 'e', 'y']
['mon', 'ke', 'y']
['monk', 'ey']
['monk', 'e', 'y']
['monke', 'y']

A string (as opposed to list) oriented approach is to think of the each adjacent pair of characters being separated by either a space or empty string. 面向字符串(与列表相反)的方法是考虑每对相邻字符之间用空格或空字符串分隔。 That can be mapped to 1 and 0, and the number of possible splits are a power of 2: 可以映射为1和0,可能的拆分数为2的幂:

2 ^ (len(s)-1) 2 ^(len(s)-1)

for example, "key" can have '' or ' ' separating 'ke' and a '' or ' ' separating 'ey' which leads to 4 possibilities: 例如,“键”可以具有“或”分隔“ ke”和一个“或”分隔“ ey”,这导致4种可能性:

  • key ('' between 'k' and 'e', '' between 'e' and 'y') 键(``在'k'和'e'之间,''在'e'和'y'之间)
  • k ey (' ' between 'k' and 'e', '' between 'e' and 'y') k ey(''在'k'和'e'之间,''在'e'和'y'之间)
  • key (' ' between 'k' and 'e', ' ' between 'e' and 'y') 键(''在'k'和'e'之间,''在'e'和'y'之间)
  • ke y ('' between 'k' and 'e', ' ' between 'e' and 'y') ke y(``在'k'和'e'之间,''在'e'和'y'之间)

An unreadable python one liner that gives you a generator in string form: 难以理解的python一个内衬,可为您提供字符串形式的生成器:

operator_positions = (''.join([str(a >> i & 1).replace('0', '').replace('1', ' ') + s[len(s)-1-i] for i in range(len(s)-1, -1, -1)]) for a in range(pow(2, len(s)-1)))

A readable version of this generator with comments and sample: 该生成器的可读版本,带有注释和示例:

s = 'monkey'
s_length = len(s)-1  # represents the number of ' ' or '' that can split digits

operator_positions = (
    ''.join(
        [str(a >> i & 1).replace('0', '').replace('1', ' ') + s[s_length-i]
         for i in range(s_length, -1, -1)])   # extra digit is for blank string to always precede first digit
    for a in range(pow(2, s_length))   # binary number loop
)
for i in operator_positions:
    print i

str(a >> i & 1) converts a into a binary string, which then has it's 0's and 1's replaced by '' and ' ', respectively. str(a >> i&1)将a转换为二进制字符串,然后将其0和1分别替换为''和''。 The binary string is an extra digit long so that the first digit is always ''. 二进制字符串是一个额外的数字,因此第一个数字始终为“”。 That way, as the digit splitter is combined with the first character, it always results in just the first character. 这样,由于数字分隔符与第一个字符组合在一起,因此始终只产生第一个字符。

Consider more_itertools.partitions : 考虑more_itertools.partitions

Given 给定

import more_itertools as mit


s = "monkey"

Demo 演示版

As-is: 原样:

list(mit.partitions(s))
#[[['m', 'o', 'n', 'k', 'e', 'y']],
# [['m'], ['o', 'n', 'k', 'e', 'y']],
# [['m', 'o'], ['n', 'k', 'e', 'y']],
# [['m', 'o', 'n'], ['k', 'e', 'y']],
# [['m', 'o', 'n', 'k'], ['e', 'y']],
# [['m', 'o', 'n', 'k', 'e'], ['y']],
# ...]

After some string joining: 加入一些字符串后:

[list(map("".join, x)) for x in mit.partitions(s)]

Output 输出量

[['monkey'],
 ['m', 'onkey'],
 ['mo', 'nkey'],
 ['mon', 'key'],
 ['monk', 'ey'],
 ['monke', 'y'],
 ['m', 'o', 'nkey'],
 ['m', 'on', 'key'],
 ['m', 'onk', 'ey'],
 ['m', 'onke', 'y'],
 ['mo', 'n', 'key'],
 ['mo', 'nk', 'ey'],
 ['mo', 'nke', 'y'],
 ['mon', 'k', 'ey'],
 ['mon', 'ke', 'y'],
 ['monk', 'e', 'y'],
 ['m', 'o', 'n', 'key'],
 ['m', 'o', 'nk', 'ey'],
 ['m', 'o', 'nke', 'y'],
 ['m', 'on', 'k', 'ey'],
 ['m', 'on', 'ke', 'y'],
 ['m', 'onk', 'e', 'y'],
 ['mo', 'n', 'k', 'ey'],
 ['mo', 'n', 'ke', 'y'],
 ['mo', 'nk', 'e', 'y'],
 ['mon', 'k', 'e', 'y'],
 ['m', 'o', 'n', 'k', 'ey'],
 ['m', 'o', 'n', 'ke', 'y'],
 ['m', 'o', 'nk', 'e', 'y'],
 ['m', 'on', 'k', 'e', 'y'],
 ['mo', 'n', 'k', 'e', 'y'],
 ['m', 'o', 'n', 'k', 'e', 'y']]

Install via > pip install more_itertools . 通过> pip install more_itertools

My solution allows you to also set a threshold for the minimum size of sub string 我的解决方案允许您还设置子字符串最小大小的阈值

This is my code: 这是我的代码:

def split_string (s, min_str_length = 2, root_string=[], results=[] ):
    """
    :param s: word to split, string
    :param min_str_length: the minimum character for a sub string
    :param root_string:  leave empty
    :param results: leave empty
    :return: nested list of all possible combinations of word split according to the minimum substring length
    """
    for i in range(min_str_length,len(s)):
        if i == min_str_length:
            primary_root_string=root_string
        else:
            root_string = primary_root_string
        if len(s[i:])>= min_str_length :
            results.append(list(chain(*[root_string,[s[:i]],[s[i:]]])))
            root_string = list(chain(*[root_string,[s[:i]]]))
            split_string(s[i:], min_str_length, root_string, results)
    return results

Examples of use: 使用示例:

Input: split_string ('monkey', min_str_length = 1, root_string=[], results=[] )
Output: 
[['m', 'onkey'],
 ['m', 'o', 'nkey'],
 ['m', 'o', 'n', 'key'],
 ['m', 'o', 'n', 'k', 'ey'],
 ['m', 'o', 'n', 'k', 'e', 'y'],
 ['m', 'o', 'n', 'ke', 'y'],
 ['m', 'o', 'nk', 'ey'],
 ['m', 'o', 'nk', 'e', 'y'],
 ['m', 'o', 'nke', 'y'],
 ['m', 'on', 'key'],
 ['m', 'on', 'k', 'ey'],
 ['m', 'on', 'k', 'e', 'y'],
 ['m', 'on', 'ke', 'y'],
 ['m', 'onk', 'ey'],
 ['m', 'onk', 'e', 'y'],
 ['m', 'onke', 'y'],
 ['mo', 'nkey'],
 ['mo', 'n', 'key'],
 ['mo', 'n', 'k', 'ey'],
 ['mo', 'n', 'k', 'e', 'y'],
 ['mo', 'n', 'ke', 'y'],
 ['mo', 'nk', 'ey'],
 ['mo', 'nk', 'e', 'y'],
 ['mo', 'nke', 'y'],
 ['mon', 'key'],
 ['mon', 'k', 'ey'],
 ['mon', 'k', 'e', 'y'],
 ['mon', 'ke', 'y'],
 ['monk', 'ey'],
 ['monk', 'e', 'y'],
 ['monke', 'y']]

or 要么

Input: split_string ('monkey', min_str_length = 2, root_string=[], results=[] )
Output: [['mo', 'nkey'], ['mo', 'nk', 'ey'], ['mon', 'key'], ['monk', 'ey']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM