简体   繁体   English

如何从Python的字符串列表中创建所有可能的长度为100个字符的句子

[英]How to create all possible sentence of length 100 characters from a list of strings in Python

I am trying to create a sentence of length 100 characters from a given list of strings. 我正在尝试从给定的字符串列表中创建一个长度为100个字符的句子。 The length has to be exactly one hundred characters. 长度必须恰好是一百个字符。 We also have to find all possible sentences using permutation. 我们还必须使用置换找到所有可能的句子。 There has to be a space between each word, and duplicate words are not allowed. 每个单词之间必须有一个空格,并且不允许重复单词。 The list is given below: 列表如下:

['saintliness', 'wearyingly', 'shampoo', 'headstone', 'dripdry', 'elapse', 'redaction', 'allegiance', 'expressionless', 'awesomeness', 'hearkened', 'aloneness', 'beheld', 'courtship', 'swoops', 'memphis', 'attentional', 'pintsized', 'rustics', 'hermeneutics', 'dismissive', 'delimiting', 'proposes', 'between', 'postilion', 'repress', 'racecourse', 'matures', 'directions', 'bloodline', 'despairing', 'syrian', 'guttering', 'unsung', 'suspends', 'coachmen', 'usurpation', 'convenience', 'portal', 'deferentially', 'tarmacadam', 'underlay', 'lifetime', 'nudeness', 'influences', 'unicyclists', 'endangers', 'unbridled', 'kennedy', 'indian', 'reminiscent', 'ravish', 'republics', 'nucleic', 'acacia', 'redoubled', 'minnows', 'bucklers', 'decays', 'garnered', 'aussies', 'harshen', 'monogram', 'consignments', 'continuum', 'pinion', 'inception', 'immoderate', 'reiterated', 'hipster', 'stridently', 'relinquished', 'microphones', 'righthanders', 'ethereally', 'glutted', 'dandies', 'entangle', 'selfdestructive', 'selfrighteous', 'rudiments', 'spotlessly', 'comradeinarms', 'shoves', 'presidential', 'amusingly', 'schoolboys', 'phlogiston', 'teachable', 'letting', 'remittances', 'armchairs', 'besieged', 'monophthongs', 'mountainside', 'aweless', 'redialling', 'licked', 'shamming', 'eigenstate']

Approach: 方法:

My first approach is to use backtracking and permutations to generate all sentences. 我的第一种方法是使用回溯和排列生成所有句子。 But I think the complexity will be too high since my list is so big. 但是我认为,由于我的清单很大,因此复杂性会很高。

Is there any other method I can use here or some inbuilt functions/packages I can use here? 我可以在这里使用其他任何方法,还是可以在这里使用一些内置函数/包? What will be best way in python to do this? python中执行此操作的最佳方法是什么? Any pointers will be helpful here. 任何指针在这里都会有所帮助。

This problem is similar to the problem of partitioning in number theory . 这个问题类似于数论中划分问题。

The complexity of the problem can (presumably) be reduced using some of the constraints that are encoded in the problem statement: (大概)可以使用问题陈述中编码的某些约束来降低问题的复杂性:

  1. The lengths of the words in the words list. 单词列表中单词的长度。
  2. Repeats of word lengths: for example a word of length 8 is repeated X times. 重复单词长度:例如,将长度为8的单词重复X次。

Here's a possible general approach (would take some refining): 这是一种可能的一般方法(将进行一些改进):

  • Find all partitions for the number 100 using only the lengths of the words in the words list. 仅使用单词列表中单词的长度查找数字100的所有分区。 (You would start with word lengths and their repeats, and not by brute forcing all possible partitions.) (您将从字长及其重复开始,而不是通过强行强行压缩所有可能的分区。)

  • Filter out partitions that have repeat length values exceeding repeat length values for words in the list. 筛选出重复长度值超过列表中单词重复长度值的分区。

  • Apply combinations of words onto the partitions. 将单词组合应用到分区上。 A set of words of equal length will be mapped to length values in a partition. 一组长度相等的单词将映射到分区中的长度值。 Say for example you have the partition (15+15+15+10+10+10+10+5+5+5) then you would generate combinations for all length 15 words over 3, length 10 words over 4, and length 5 words over 3. (I'm ignoring the space separation issue here). 假设您有分区(15+15+15+10+10+10+10+5+5+5)那么您将生成长度为3个以上的长度为15个单词,长度为4个以上的长度为10个单词和长度为5的组合超过3个字。(我在这里忽略了空格分隔问题)。

  • Generate permutations of all the combinations over all the partitions. 生成所有分区上所有组合的排列。

You can't do it. 你做不到

Think about it: even for selecting 4 words you already have 100 × 99 × 98 × 97 possibilities, almost 100 million. 考虑一下:即使选择4个单词,您也已经有100×99×98×97的可能性,几乎是1亿。

Given the length of your words at least 8 of them will fit in the sentence. 考虑到单词的长度,句子中至少要包含8个单词。 There is 100 × 99 × 98 … × 93 possibilities. 有100×99×98…×93种可能性。 That's approximately 7×10^15, a totally infeasible number. 大约是7×10 ^ 15,这是完全不可行的数字。

Simplify a bit: Change all the strings from "xxx" to "xxx ". 简化一点:将所有字符串从“ xxx”更改为“ xxx”。 Then set the sentence length to 101. This allows you to use len(x) instead of len(x)+1 and eliminates the edge case for the last word in the sentence. 然后将句子的长度设置为101。这使您可以使用len(x)而不是len(x)+1并消除句子中最后一个单词的边缘大小写。 As you traverse, and build the sentence left to right, you can eliminate words that would overflow the length, based on the sentence you've just constructed. 在遍历并从左到右构建句子时,您可以根据刚刚构建的句子来消除会超出长度的单词。

UPDATE: 更新:

Consider this to be a base n number problem where n is the number of words you have. 认为这是一个基数n问题,其中n是您所拥有的单词数。 Create a vector initialized with 0 [NOTE: it's only fixed size to illustrate]: 创建一个以0初始化的向量[注意:它仅是固定大小,用于说明]:

acc = [0, 0, 0, 0]

This is your "accumulator". 这是您的“累加器”。

Now construct your sentence: 现在构造您的句子:

dict[acc[0]] + dict[acc[1]] + dict[acc[2]] + dict[acc[3]]

So, you get able able able able 因此,您将able able able able

Now increment the most significant "digit" in the acc. 现在增加acc中最重要的“数字”。 This is denoted by "curpos". 这被称为“ curpos”。 Here curpos is 3. 这里的目标是3。

[0, 0, 0, 1]

Now you get able able able baker 现在你有able able able baker

You keep bumping acc[curpos] until you hit [0, 0, 0, n] Now you've got a "carry out". 您一直碰碰acc [curpos]直到您击中[0, 0, 0, n]现在您有了一个“执行”。 "Go left" by decrementing curpos to 2. increment acc[curpos]. 通过将Curcur递减至2.来增加acc [curpos]。 If it doesn't "carry out", "go right" by incrementing curpos and set acc[curpos] = 0. If you had gotten a carry out, you'd do a "go left" by decrementing curpos to 1. 如果没有“执行”,则通过增加curcur并“设置为go”并设置acc [curpos] = 0。

This is a form of backtracking (eg the "go left"), but you don't need a tree. 这是回溯的一种形式(例如“向左走”),但是您不需要树。 Just this acc vector and a state machine with three states: goleft, goright, test/trunc/output/inc. 只是这个acc向量和具有三个状态的状态机:goleft,goright,test / trunc / output / inc。

After the "go right" curpos will be back to the "most significant" position. 在“前进”后,目标将回到“最重要”的位置。 That is, the sentence length constructed from acc[0 to curpos - 1] (the length without adding the final word) is less than 100. If it's too long (eg it's already over 100), do a "go left". 也就是说,从acc [0到curpos-1]构造的句子长度( 加最后一个词的长度)小于100。如果它太长(例如,已经超过100),请执行“左移”。 If it's too short (eg you've got to add another word to get near [enough] to 100), do a "go right" 如果太短(例如,您必须添加另一个单词以使[足够]接近100),请执行“继续”

When you get a carry out and curpos==0, you're done 当您进行进位并且curpos == 0时,您就完成了

I recently devised this as a solution to the "vampire number challenge" and the traversal you need is very similar. 我最近将其设计为“吸血鬼数量挑战”的解决方案,您需要的遍历非常相似。

Your problem size is way too large, but if 1) your actual problem is much smaller in scope, and/or 2) you have a lot of time and a very fast computer, you can generate these permutations using a recursive generator. 您的问题规模太大了,但是如果1)您的实际问题在范围上小得多,和/或2)您有很多时间和一台非常快的计算机,则可以使用递归生成器生成这些排列。

def f(string, list1):
    for word in list1:
        new_string = string + (' ' if string else '') + word
        # If there are other constraints that will allow you to prune branches,
        # you can add those conditions here and break out of the for loop
        if len(new_string) >= 100:
            yield new_string[:100]
        else:
            list2 = list1[:]
            list2.remove(word)
            for item in f(new_string, list2):
                yield item

x = f('', list1)
for sentence in x:
    check(sentence)

One caveat is that this may produce identical sentences if two words at the end get truncated to look the same. 一个警告是,如果最后两个单词被截短以看起来相同,则可能产生相同的句子。

I am not going to provide a complete solution, but I'll walk through my thinking. 我不会提供完整的解决方案,但是我会逐步思考。

Constraints: 限制条件:

  • A permutation of your complete list that exceeds 100 characters can be immediately thrown out. 超过100个字符的完整列表的排列可以立即抛出。 (Ok, 99 + len(longest_word)) .) (确定, 99 + len(longest_word)) 。)
  • You are essentially dealing with a subset of the power set of elements in your list. 本质上,您正在处理列表中元素的强大功能的子集。

Given that: 鉴于:

  • Build the power set, but discard any sentences that exceed your maximum 建立功率集,但舍弃所有超过最大句子的句子
  • Filter the final set for sentences that exactly match your needs 筛选出最符合您需求的句子的最终集合

So you can have the following: 因此,您可以拥有以下内容:

def construct_sentences(dictionary: list, length: int) -> list:
    if not dictionary:
        return [(0, [])]
    else:
        word = dictionary[0]
        word_length = len(word) + 1
        subset_length = length - word_length
        sentence_subset = construct_sentences(dictionary[1:], subset_length)
        new_sentences = []
        for sentence_length, sentence in sentence_subset:
            if sentence_length + word_length <= length:
                new_sentences = new_sentences + [(sentence_length + word_length, sentence + [word])]
        return new_sentences + sentence_subset

I'm using tuples to write-aside the length of the list and make it easily available for comparison. 我正在使用元组来预留列表的长度,并使其易于比较。 The result of the above function will give you a list of sentences that are all less than the length (which is key when considering potential permutations: 100 is fairly short so there is a vast number of permutations that can be readily discarded). 上述函数的结果会给你都小于长句子的列表(考虑可能的排列组合时,这是关键:100相当短,所以有排列的大量可随手丢弃)。 The next step would be to simply filter any sentence that isn't long enough (ie 100 characters). 下一步将是简单地filter所有不够长的句子(即100个字符)。

Note that at this point you have every possible list filtering your criteria, but that list may be reordered 2^n ways. 请注意,此时您已经拥有了所有可能的列表来过滤您的条件,但是该列表可能以2^n方式重新排序。 Still, that becomes a more manageable situation. 尽管如此,这仍然是一个更易于管理的情况。 With a list of 100 words, averaging under 9 characters a word, you have a average number of words in a sentence equal to 10. 2^10 isn't the worst situation in the world... 列表中包含100个单词,每个单词平均少于9个字符,则句子中的平均单词数等于10。2 2^10并不是世界上最糟糕的情况...

You'll have to modify it for your truncation case, of course, but this gets you in the ballpark. 当然,您必须针对截断情况对其进行修改,但这会使您陷入困境。 Unless I completely missed something, which is always possible. 除非我完全错过了某些东西,否则这总是可能的。 I doubly think something is wrong because running this produces a surprisingly short list. 我双重认为是错误的,因为运行它会产生令人惊讶的简短列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从字符串列表中创建特定长度的所有可能订单 - how to create all possible orders in a specific length from a list of strings 如何从列表长度创建字符列表? - How to create a list of Characters from list length? python递归:使用单词的键值字典创建一定长度的所有可能句子 - python recursion: create all possible sentence of certain length using a key-value dictionary of word 列出字符给定可能性中的所有可能字符串 - List all possible strings from given possibilities for characters 从 python 中的字符串列表创建给定长度的随机列表 - Create random list of given length from a list of strings in python 打印可以由一组 n 个字符组成的所有可能的长度为 k 的字符串返回 &gt;n 个字符 - Print all possible strings of length k that can be formed from a set of n characters returns >n characters Python 检查字符串列表中的句子中是否存在字符串 - Python Check if a string is there in a sentence from a list of strings 如何从字符串列表中创建所有第 n 个字符的列表列表? - How to create a list of lists of all the n'th characters from a list of strings? 从python的字符串列表中删除或删除所有特殊字符 - strip or remove all special characters from list of strings in python 如何在python中删除列表的所有字符串上的特定字符? - How do I remove specific characters on all strings of a list in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM