[英]Find all list permutations of splitting a string in Python
I have a string of letters that I'd like to split into all possible combinations (the order of letters must be remain fixed), so that: 我有一个字母字符串,我想将其分成所有可能的组合(字母顺序必须保持固定),以便:
s = 'monkey'
becomes: 变成:
combinations = [['m', 'onkey'], ['mo', 'nkey'], ['m', 'o', 'nkey'] ... etc]
Any ideas? 有任何想法吗?
http://wordaligned.org/articles/partitioning-with-python contains an interesting post about sequence partitioning, here is the implementation they use: http://wordaligned.org/articles/partitioning-with-python包含有关序列分区的有趣文章,这是它们使用的实现:
#!/usr/bin/env python
# From http://wordaligned.org/articles/partitioning-with-python
from itertools import chain, combinations
def sliceable(xs):
'''Return a sliceable version of the iterable xs.'''
try:
xs[:0]
return xs
except TypeError:
return tuple(xs)
def partition(iterable):
s = sliceable(iterable)
n = len(s)
b, mid, e = [0], list(range(1, n)), [n]
getslice = s.__getitem__
splits = (d for i in range(n) for d in combinations(mid, i))
return [[s[sl] for sl in map(slice, chain(b, d), chain(d, e))]
for d in splits]
if __name__ == '__main__':
s = "monkey"
for i in partition(s):
print i
Which would print: 哪个会打印:
['monkey']
['m', 'onkey']
['mo', 'nkey']
['mon', 'key']
['monk', 'ey']
['monke', 'y']
['m', 'o', 'nkey']
['m', 'on', 'key']
['m', 'onk', 'ey']
['m', 'onke', 'y']
['mo', 'n', 'key']
['mo', 'nk', 'ey']
['mo', 'nke', 'y']
['mon', 'k', 'ey']
['mon', 'ke', 'y']
['monk', 'e', 'y']
...
['mo', 'n', 'k', 'e', 'y']
['m', 'o', 'n', 'k', 'e', 'y']
def splitter(str):
for i in range(1, len(str)):
start = str[0:i]
end = str[i:]
yield (start, end)
for split in splitter(end):
result = [start]
result.extend(split)
yield result
combinations = list(splitter(str))
Note that I defaulted to a generator to save you from running out of memory with long strings. 请注意,我默认使用生成器来避免长字符串耗尽内存。
The idea is to realize that the permutation of a string s
is equal to a set containing s
itself, and a set union of each substring X
of s
with the permutation of s\\X
. 这个想法是要认识到的字符串的置换
s
等于包含一组s
本身,并且每个子串的一组联合X
的s
带的排列s\\X
。 For example, permute('key')
: 例如,
permute('key')
:
{'key'} # 'key' itself
{'k', 'ey'} # substring 'k' union 1st permutation of 'ey' = {'e, 'y'}
{'k', 'e', 'y'} # substring 'k' union 2nd permutation of 'ey' = {'ey'}
{'ke', 'y'} # substring 'ke' union 1st and only permutation of 'y' = {'y'}
key
. key
所有排列。 With this in mind, a simple algorithm can be implemented: 考虑到这一点,可以实现一个简单的算法:
>>> def permute(s):
result = [[s]]
for i in range(1, len(s)):
first = [s[:i]]
rest = s[i:]
for p in permute(rest):
result.append(first + p)
return result
>>> for p in permute('monkey'):
print(p)
['monkey']
['m', 'onkey']
['m', 'o', 'nkey']
['m', 'o', 'n', 'key']
['m', 'o', 'n', 'k', 'ey']
['m', 'o', 'n', 'k', 'e', 'y']
['m', 'o', 'n', 'ke', 'y']
['m', 'o', 'nk', 'ey']
['m', 'o', 'nk', 'e', 'y']
['m', 'o', 'nke', 'y']
['m', 'on', 'key']
['m', 'on', 'k', 'ey']
['m', 'on', 'k', 'e', 'y']
['m', 'on', 'ke', 'y']
['m', 'onk', 'ey']
['m', 'onk', 'e', 'y']
['m', 'onke', 'y']
['mo', 'nkey']
['mo', 'n', 'key']
['mo', 'n', 'k', 'ey']
['mo', 'n', 'k', 'e', 'y']
['mo', 'n', 'ke', 'y']
['mo', 'nk', 'ey']
['mo', 'nk', 'e', 'y']
['mo', 'nke', 'y']
['mon', 'key']
['mon', 'k', 'ey']
['mon', 'k', 'e', 'y']
['mon', 'ke', 'y']
['monk', 'ey']
['monk', 'e', 'y']
['monke', 'y']
A string (as opposed to list) oriented approach is to think of the each adjacent pair of characters being separated by either a space or empty string. 面向字符串(与列表相反)的方法是考虑每对相邻字符之间用空格或空字符串分隔。 That can be mapped to 1 and 0, and the number of possible splits are a power of 2:
可以映射为1和0,可能的拆分数为2的幂:
2 ^ (len(s)-1) 2 ^(len(s)-1)
for example, "key" can have '' or ' ' separating 'ke' and a '' or ' ' separating 'ey' which leads to 4 possibilities: 例如,“键”可以具有“或”分隔“ ke”和一个“或”分隔“ ey”,这导致4种可能性:
An unreadable python one liner that gives you a generator in string form: 难以理解的python一个内衬,可为您提供字符串形式的生成器:
operator_positions = (''.join([str(a >> i & 1).replace('0', '').replace('1', ' ') + s[len(s)-1-i] for i in range(len(s)-1, -1, -1)]) for a in range(pow(2, len(s)-1)))
A readable version of this generator with comments and sample: 该生成器的可读版本,带有注释和示例:
s = 'monkey'
s_length = len(s)-1 # represents the number of ' ' or '' that can split digits
operator_positions = (
''.join(
[str(a >> i & 1).replace('0', '').replace('1', ' ') + s[s_length-i]
for i in range(s_length, -1, -1)]) # extra digit is for blank string to always precede first digit
for a in range(pow(2, s_length)) # binary number loop
)
for i in operator_positions:
print i
str(a >> i & 1) converts a into a binary string, which then has it's 0's and 1's replaced by '' and ' ', respectively. str(a >> i&1)将a转换为二进制字符串,然后将其0和1分别替换为''和''。 The binary string is an extra digit long so that the first digit is always ''.
二进制字符串是一个额外的数字,因此第一个数字始终为“”。 That way, as the digit splitter is combined with the first character, it always results in just the first character.
这样,由于数字分隔符与第一个字符组合在一起,因此始终只产生第一个字符。
Consider more_itertools.partitions
: 考虑
more_itertools.partitions
:
Given 给定
import more_itertools as mit
s = "monkey"
Demo 演示版
As-is: 原样:
list(mit.partitions(s))
#[[['m', 'o', 'n', 'k', 'e', 'y']],
# [['m'], ['o', 'n', 'k', 'e', 'y']],
# [['m', 'o'], ['n', 'k', 'e', 'y']],
# [['m', 'o', 'n'], ['k', 'e', 'y']],
# [['m', 'o', 'n', 'k'], ['e', 'y']],
# [['m', 'o', 'n', 'k', 'e'], ['y']],
# ...]
After some string joining: 加入一些字符串后:
[list(map("".join, x)) for x in mit.partitions(s)]
Output 输出量
[['monkey'],
['m', 'onkey'],
['mo', 'nkey'],
['mon', 'key'],
['monk', 'ey'],
['monke', 'y'],
['m', 'o', 'nkey'],
['m', 'on', 'key'],
['m', 'onk', 'ey'],
['m', 'onke', 'y'],
['mo', 'n', 'key'],
['mo', 'nk', 'ey'],
['mo', 'nke', 'y'],
['mon', 'k', 'ey'],
['mon', 'ke', 'y'],
['monk', 'e', 'y'],
['m', 'o', 'n', 'key'],
['m', 'o', 'nk', 'ey'],
['m', 'o', 'nke', 'y'],
['m', 'on', 'k', 'ey'],
['m', 'on', 'ke', 'y'],
['m', 'onk', 'e', 'y'],
['mo', 'n', 'k', 'ey'],
['mo', 'n', 'ke', 'y'],
['mo', 'nk', 'e', 'y'],
['mon', 'k', 'e', 'y'],
['m', 'o', 'n', 'k', 'ey'],
['m', 'o', 'n', 'ke', 'y'],
['m', 'o', 'nk', 'e', 'y'],
['m', 'on', 'k', 'e', 'y'],
['mo', 'n', 'k', 'e', 'y'],
['m', 'o', 'n', 'k', 'e', 'y']]
Install via > pip install more_itertools
. 通过
> pip install more_itertools
。
My solution allows you to also set a threshold for the minimum size of sub string 我的解决方案允许您还设置子字符串最小大小的阈值
This is my code: 这是我的代码:
def split_string (s, min_str_length = 2, root_string=[], results=[] ):
"""
:param s: word to split, string
:param min_str_length: the minimum character for a sub string
:param root_string: leave empty
:param results: leave empty
:return: nested list of all possible combinations of word split according to the minimum substring length
"""
for i in range(min_str_length,len(s)):
if i == min_str_length:
primary_root_string=root_string
else:
root_string = primary_root_string
if len(s[i:])>= min_str_length :
results.append(list(chain(*[root_string,[s[:i]],[s[i:]]])))
root_string = list(chain(*[root_string,[s[:i]]]))
split_string(s[i:], min_str_length, root_string, results)
return results
Examples of use: 使用示例:
Input: split_string ('monkey', min_str_length = 1, root_string=[], results=[] )
Output:
[['m', 'onkey'],
['m', 'o', 'nkey'],
['m', 'o', 'n', 'key'],
['m', 'o', 'n', 'k', 'ey'],
['m', 'o', 'n', 'k', 'e', 'y'],
['m', 'o', 'n', 'ke', 'y'],
['m', 'o', 'nk', 'ey'],
['m', 'o', 'nk', 'e', 'y'],
['m', 'o', 'nke', 'y'],
['m', 'on', 'key'],
['m', 'on', 'k', 'ey'],
['m', 'on', 'k', 'e', 'y'],
['m', 'on', 'ke', 'y'],
['m', 'onk', 'ey'],
['m', 'onk', 'e', 'y'],
['m', 'onke', 'y'],
['mo', 'nkey'],
['mo', 'n', 'key'],
['mo', 'n', 'k', 'ey'],
['mo', 'n', 'k', 'e', 'y'],
['mo', 'n', 'ke', 'y'],
['mo', 'nk', 'ey'],
['mo', 'nk', 'e', 'y'],
['mo', 'nke', 'y'],
['mon', 'key'],
['mon', 'k', 'ey'],
['mon', 'k', 'e', 'y'],
['mon', 'ke', 'y'],
['monk', 'ey'],
['monk', 'e', 'y'],
['monke', 'y']]
or 要么
Input: split_string ('monkey', min_str_length = 2, root_string=[], results=[] )
Output: [['mo', 'nkey'], ['mo', 'nk', 'ey'], ['mon', 'key'], ['monk', 'ey']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.