簡體   English   中英

Python - 按順序拆分字符串

[英]Python - get all in order splits of a string

也就是說,對於一個句子,將它分解為所有可能的有序單詞組合,沒有遺漏任何單詞

例如,對於輸入“The cat sat on the mat”

輸出

[("The", "cat sat on the mat"),
("The cat", "sat on the mat"),  
("The cat", "sat", "on the mat")] #etc

但不是

("The mat", "cat sat on the") # out of order
("The cat"), ("mat") # words missing

我查看了 itertools 中的方法,但看不到它們的作用,因為組合會遺漏項目(“貓”、“墊子”)並且排列會改變順序。

我是否遺漏了這些工具中的某些東西,或者它們只是不正確的東西?

(為了清楚起見,這不是關於如何拆分字符串的問題,而是關於如何獲得組合的問題)

根據 WordAligned 的這篇博客文章的啟發,修改Raymond Hettinger 的Python 3分區配方,以及您列表中的每個分區案例,我們可以使用 itertools 中的chaincombinations來完成此操作。

from itertools import chain, combinations
def partition(iterable):
    n = len(input_list)
    b, mid, e = [0], list(range(1, n)), [n]
    getslice = input_list.__getitem__
    splits = (d for i in range(n) for d in combinations(mid, i))
    return [[input_list[sl] for sl in map(slice, chain(b, d), chain(d, e))]
            for d in splits]

演示

>>> print(partition(input_list))
[[['The', 'cat', 'sat', 'on', 'the', 'mat']], [['The'], ['cat', 'sat', 'on', 'the', 'mat']], [['The', 'cat'], ['sat', 'on', 'the', 'mat']], [['The', 'cat', 'sat'], ['on', 'the', 'mat']], [['The', 'cat', 'sat', 'on'], ['the', 'mat']], [['The', 'cat', 'sat', 'on', 'the'], ['mat']], [['The'], ['cat'], ['sat', 'on', 'the', 'mat']], [['The'], ['cat', 'sat'], ['on', 'the', 'mat']], [['The'], ['cat', 'sat', 'on'], ['the', 'mat']], [['The'], ['cat', 'sat', 'on', 'the'], ['mat']], [['The', 'cat'], ['sat'], ['on', 'the', 'mat']], [['The', 'cat'], ['sat', 'on'], ['the', 'mat']], [['The', 'cat'], ['sat', 'on', 'the'], ['mat']], [['The', 'cat', 'sat'], ['on'], ['the', 'mat']], [['The', 'cat', 'sat'], ['on', 'the'], ['mat']], [['The', 'cat', 'sat', 'on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on', 'the', 'mat']], [['The'], ['cat'], ['sat', 'on'], ['the', 'mat']], [['The'], ['cat'], ['sat', 'on', 'the'], ['mat']], [['The'], ['cat', 'sat'], ['on'], ['the', 'mat']], [['The'], ['cat', 'sat'], ['on', 'the'], ['mat']], [['The'], ['cat', 'sat', 'on'], ['the'], ['mat']], [['The', 'cat'], ['sat'], ['on'], ['the', 'mat']], [['The', 'cat'], ['sat'], ['on', 'the'], ['mat']], [['The', 'cat'], ['sat', 'on'], ['the'], ['mat']], [['The', 'cat', 'sat'], ['on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on'], ['the', 'mat']], [['The'], ['cat'], ['sat'], ['on', 'the'], ['mat']], [['The'], ['cat'], ['sat', 'on'], ['the'], ['mat']], [['The'], ['cat', 'sat'], ['on'], ['the'], ['mat']], [['The', 'cat'], ['sat'], ['on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on'], ['the'], ['mat']]]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM