简体   繁体   English

使用迭代器将单词中的两个部分串联在一起

[英]concatenate two parts of a word in a list using iterator

I need to concatenate certain words that appear separated in a list of words, such as "computer" (below). 我需要连接出现在单词列表中的某些单词,例如"computer" (如下)。 These words appear separated in the list due to line breaks and I want to fix this. 由于断行,这些单词在列表中显得分开,我想解决此问题。

lst=['love','friend', 'apple', 'com', 'puter']

the expected result is: 预期结果是:

lst=['love','friend', 'apple', 'computer']

My code doesn't work. 我的代码不起作用。 Can anyone help me to do that? 谁能帮我做到这一点?

the code I am trying is: 我正在尝试的代码是:

from collections import defaultdict
import enchant
import string
words=['love', 'friend', 'car', 'apple', 
'com', 'puter', 'vi']
myit = iter(words)
dic=enchant.Dict('en_UK')
lst=[]

errors=[]

for i in words:

   if  dic.check(i) is True:

      lst.append(i)
   if dic.check(i) is False:

      a= i + next(myit)

   if dic.check(a) is True:

      lst.append(a)

   else:

     continue



print (lst)`

The main problem with your code is that you are, on the one hand, iterating words in the for loop and, on the other hand, through the iterator myit . 代码的主要问题在于,一方面,您要在for循环中迭代words ,另一方面,要通过迭代器myit进行迭代。 These two iterations are independent, so you cannot use next(myit) within your loop to get the word after i (also, if i is the last word there would be no next word). 这两个迭代是独立的,因此您不能在循环中使用next(myit)来获取i后的单词(而且,如果i是最后一个单词,则不会有下一个单词)。 On the other hand, your problem can be complicated by the fact that there may be split words with parts that are too in the dictionary (eg printable is a word, but so are print and able ). 另一方面,您的问题可能会因以下事实而变得复杂:可能存在拆分单词,而词典中的部分也是如此(例如, printable是一个单词,而printable也是如此)。

Assuming a simple scenario where split word parts are never in the dictionary, I think this algorithm could work better for you: 假设有一个简单的场景,其中拆分词部分永远不在词典中,我认为此算法可能对您更好:

import enchant

words = ['love', 'friend', 'car', 'apple', 'com', 'puter', 'vi']
myit = iter(words)
dic = enchant.Dict('en_UK')
lst = []
# The word that you are currently considering
current = ''
for i in words:
    # Add the next word
    current += i
    # If the current word is in the dictionary
    if dic.check(current):
        # Add it to the list
        lst.append(current)
        # Clear the current word
        current = ''
    # If the word is not in the dictionary we keep adding words to current

print(lst)

Notwithstanding the fact that this method is not very robust (you would miss "ham-burger", for example), the main error was that you didn't loop on the iterator, but on the list itself. 尽管该方法不是很健壮(例如,您可能会错过“汉堡”),但主要错误是您没有在迭代器上循环,而是在列表本身上循环。 Here is a corrected version. 这是更正的版本。

Note that I renamed the variables to give them more expressive names, and I replaced the dictionnary check by a simple word in dic with a sample vocabulary - the module you import is not part of the standard library, which make your code as-is difficult to run for us who don't have it. 请注意,我重命名了变量以赋予它们更具表达性的名称,然后用示例词汇将word in dic Check中的简单word in dic替换为word in dic的简单word in dic -导入的模块不是标准库的一部分,这使您的代码很难为没有它的我们奔跑。

dic = {'love', 'friend', 'car', 'apple', 
       'computer', 'banana'}

words=['love', 'friend', 'car', 'apple', 'com', 'puter', 'vi']
words_it = iter(words)

valid_words = []

for word in words_it:
    if word in dic:
        valid_words.append(word)
    else:
        try:
            concacenated = word + next(words_it)
            if concacenated in dic:
                valid_words.append(concacenated)
        except StopIteration:
            pass

print (valid_words)
# ['love', 'friend', 'car', 'apple', 'computer']

You need the try ... except part in case the last word of the list is not in the dictionnary, as next() will raise a StopIteration in this case. 您需要try ... except部分,以防列表的最后一个单词不在字典中,因为在这种情况next()将引发StopIteration

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM