繁体   English   中英

如何从python中的给定字符串生成1、2和3个单词的所有后续组合?

[英]How to generate all subsequent combinations of 1, 2 and 3 words from a given string in python?

我在python中有一个字符串。 我想得到所有一个词的子串,所有的 2 个词的子串和所有的 3 个词的子串。 执行此操作的最有效方法是什么?

我目前的解决方案是这样的:

>>> s = "This is the example string of which I want to generate subsequent combinations"
>>> words = s.split()
>>> lengths = [1, 2, 3]
>>> ans = []
>>> for ln in lengths:
...     for i in range(len(words)-ln+1):
...         ans.append(" ".join(words[i:i+ln]))
... 
>>> print(ans)
['This', 'is', 'the', 'example', 'string', 'of', 'which', 'I', 'want', 'to', 'generate', 'subsequent', 'combinations', 'This is', 'is the', 'the example', 'example string', 'string of', 'of which', 'which I', 'I want', 'want to', 'to generate', 'generate subsequent', 'subsequent combinations', 'This is the', 'is the example', 'the example string', 'example string of', 'string of which', 'of which I', 'which I want', 'I want to', 'want to generate', 'to generate subsequent', 'generate subsequent combinations']

你可以这样做:

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return list(map(lambda x: " ".join(x), chain.from_iterable(combinations(s, r) for r in range(1,4))))

s = "This is the example string of which I want to generate subsequent combinations"
print(powerset(s.split()))

详细了解请阅读: https : //stackoverflow.com/a/1482316/17073342

FWIW,你可以做你所拥有的列表理解:

[' '.join(words[i:i+l]) for l in [1,2,3] for i in range(len(words)-l+1)]

它更快吗? 一点点:

%%timeit
ans = []
for ln in [1,2,3]:
    for i in range(len(words)-ln+1):
        ans.append(" ".join(words[i:i+ln]))
        
# 8.46 µs ± 89.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
[' '.join(words[i:i+l]) for l in [1,2,3] for i in range(len(words)-l+1)]


# 7.03 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

是否更具可读性? 可能不是。 我可能只是坚持你所拥有的。

我认为最容易理解(无论如何对我来说)并且可能最快的是处理前两个词的特殊情况,然后迭代剩余的词,同时跟踪前一个词。

它具有迄今为止最快的附带好处。

words = "This is the example string of which I want to generate subsequent combinations".split()
prior_prior_word = words[0]
prior_word = words[1]
ans = [prior_prior_word, prior_word, f"{prior_prior_word} {prior_word}"]
for word in words[2:]:
    ans.append(f"{word}")
    ans.append(f"{prior_word} {word}")
    ans.append(f"{prior_prior_word} {prior_word} {word}")
    prior_prior_word = prior_word
    prior_word = word
print(ans)

如果你想timeit ,你可以尝试:

import timeit

ruchit = '''
words = "This is the example string of which I want to generate subsequent combinations".split()
def test(words):
    lengths = [1, 2, 3]
    ans = []
    for ln in lengths:
        for i in range(len(words)-ln+1):
            ans.append(" ".join(words[i:i+ln]))
    return ans
'''

tom = '''
words = "This is the example string of which I want to generate subsequent combinations".split()
def test(words):
    return [' '.join(words[i:i+l]) for l in [1,2,3] for i in range(len(words)-l+1)]
'''
        
jonsg = '''
words = "This is the example string of which I want to generate subsequent combinations".split()
def test(words):
    prior_prior_word = words[0]
    prior_word = words[1]
    ans = [prior_prior_word, prior_word, f"{prior_prior_word} {prior_word}"]
    for word in words[2:]:
        ans.append(f"{word}")
        ans.append(f"{prior_word} {word}")
        ans.append(f"{prior_prior_word} {prior_word} {word}")
        prior_prior_word = prior_word
        prior_word = word
    return ans
'''

runs = 1_000_000
print("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
print(f"Test: ruchit Time: {timeit.timeit('test(words)', setup=ruchit, number=runs)}")
print("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
print(f"Test: tom Time: {timeit.timeit('test(words)', setup=tom, number=runs)}")
print("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
print(f"Test: jonsg Time: {timeit.timeit('test(words)', setup=jonsg, number=runs)}")
print("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

这给了我:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Test: ruchit Time: 8.692457999999998
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Test: tom Time: 7.512314900000002
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Test: jonsg Time: 3.7232652
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

你的旅费可能会改变。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM