使用迭代器将单词中的两个部分串联在一起

Question

I need to concatenate certain words that appear separated in a list of words, such as "computer" (below). 我需要连接出现在单词列表中的某些单词，例如"computer" （如下）。 These words appear separated in the list due to line breaks and I want to fix this. 由于断行，这些单词在列表中显得分开，我想解决此问题。

lst=['love','friend', 'apple', 'com', 'puter']

the expected result is: 预期结果是：

lst=['love','friend', 'apple', 'computer']

My code doesn't work. 我的代码不起作用。 Can anyone help me to do that? 谁能帮我做到这一点？

the code I am trying is: 我正在尝试的代码是：

from collections import defaultdict
import enchant
import string
words=['love', 'friend', 'car', 'apple', 
'com', 'puter', 'vi']
myit = iter(words)
dic=enchant.Dict('en_UK')
lst=[]

errors=[]

for i in words:

   if  dic.check(i) is True:

      lst.append(i)
   if dic.check(i) is False:

      a= i + next(myit)

   if dic.check(a) is True:

      lst.append(a)

   else:

     continue



print (lst)`

Answer 1

The main problem with your code is that you are, on the one hand, iterating words in the for loop and, on the other hand, through the iterator myit . 代码的主要问题在于，一方面，您要在for循环中迭代words ，另一方面，要通过迭代器myit进行迭代。 These two iterations are independent, so you cannot use next(myit) within your loop to get the word after i (also, if i is the last word there would be no next word). 这两个迭代是独立的，因此您不能在循环中使用next(myit)来获取i后的单词（而且，如果i是最后一个单词，则不会有下一个单词）。 On the other hand, your problem can be complicated by the fact that there may be split words with parts that are too in the dictionary (eg printable is a word, but so are print and able ). 另一方面，您的问题可能会因以下事实而变得复杂：可能存在拆分单词，而词典中的部分也是如此（例如， printable是一个单词，而print和able也是如此）。

Assuming a simple scenario where split word parts are never in the dictionary, I think this algorithm could work better for you: 假设有一个简单的场景，其中拆分词部分永远不在词典中，我认为此算法可能对您更好：

import enchant

words = ['love', 'friend', 'car', 'apple', 'com', 'puter', 'vi']
myit = iter(words)
dic = enchant.Dict('en_UK')
lst = []
# The word that you are currently considering
current = ''
for i in words:
    # Add the next word
    current += i
    # If the current word is in the dictionary
    if dic.check(current):
        # Add it to the list
        lst.append(current)
        # Clear the current word
        current = ''
    # If the word is not in the dictionary we keep adding words to current

print(lst)

Answer 2

Notwithstanding the fact that this method is not very robust (you would miss "ham-burger", for example), the main error was that you didn't loop on the iterator, but on the list itself. 尽管该方法不是很健壮（例如，您可能会错过“汉堡”），但主要错误是您没有在迭代器上循环，而是在列表本身上循环。 Here is a corrected version. 这是更正的版本。

Note that I renamed the variables to give them more expressive names, and I replaced the dictionnary check by a simple word in dic with a sample vocabulary - the module you import is not part of the standard library, which make your code as-is difficult to run for us who don't have it. 请注意，我重命名了变量以赋予它们更具表达性的名称，然后用示例词汇将word in dic Check中的简单word in dic替换为word in dic的简单word in dic -导入的模块不是标准库的一部分，这使您的代码很难为没有它的我们奔跑。

dic = {'love', 'friend', 'car', 'apple', 
       'computer', 'banana'}

words=['love', 'friend', 'car', 'apple', 'com', 'puter', 'vi']
words_it = iter(words)

valid_words = []

for word in words_it:
    if word in dic:
        valid_words.append(word)
    else:
        try:
            concacenated = word + next(words_it)
            if concacenated in dic:
                valid_words.append(concacenated)
        except StopIteration:
            pass

print (valid_words)
# ['love', 'friend', 'car', 'apple', 'computer']

You need the try ... except part in case the last word of the list is not in the dictionnary, as next() will raise a StopIteration in this case. 您需要try ... except部分，以防列表的最后一个单词不在字典中，因为在这种情况next()将引发StopIteration 。

使用迭代器将单词中的两个部分串联在一起

问题描述

2 个解决方案

解决方案1
1 2019-03-01 11:30:19

解决方案2
1 已采纳 2019-03-01 11:46:28

使用迭代器将单词中的两个部分串联在一起

问题描述

2 个解决方案

解决方案1 1 2019-03-01 11:30:19

解决方案2 1 已采纳 2019-03-01 11:46:28

解决方案1
1 2019-03-01 11:30:19

解决方案2
1 已采纳 2019-03-01 11:46:28