[英]concatenate two parts of a word in a list using iterator
I need to concatenate certain words that appear separated in a list of words, such as "computer"
(below). 我需要连接出现在单词列表中的某些单词,例如
"computer"
(如下)。 These words appear separated in the list due to line breaks and I want to fix this. 由于断行,这些单词在列表中显得分开,我想解决此问题。
lst=['love','friend', 'apple', 'com', 'puter']
the expected result is: 预期结果是:
lst=['love','friend', 'apple', 'computer']
My code doesn't work. 我的代码不起作用。 Can anyone help me to do that?
谁能帮我做到这一点?
the code I am trying is: 我正在尝试的代码是:
from collections import defaultdict
import enchant
import string
words=['love', 'friend', 'car', 'apple',
'com', 'puter', 'vi']
myit = iter(words)
dic=enchant.Dict('en_UK')
lst=[]
errors=[]
for i in words:
if dic.check(i) is True:
lst.append(i)
if dic.check(i) is False:
a= i + next(myit)
if dic.check(a) is True:
lst.append(a)
else:
continue
print (lst)`
The main problem with your code is that you are, on the one hand, iterating words
in the for
loop and, on the other hand, through the iterator myit
. 代码的主要问题在于,一方面,您要在
for
循环中迭代words
,另一方面,要通过迭代器myit
进行迭代。 These two iterations are independent, so you cannot use next(myit)
within your loop to get the word after i
(also, if i
is the last word there would be no next word). 这两个迭代是独立的,因此您不能在循环中使用
next(myit)
来获取i
后的单词(而且,如果i
是最后一个单词,则不会有下一个单词)。 On the other hand, your problem can be complicated by the fact that there may be split words with parts that are too in the dictionary (eg printable
is a word, but so are print
and able
). 另一方面,您的问题可能会因以下事实而变得复杂:可能存在拆分单词,而词典中的部分也是如此(例如,
printable
是一个单词,而print
和able
也是如此)。
Assuming a simple scenario where split word parts are never in the dictionary, I think this algorithm could work better for you: 假设有一个简单的场景,其中拆分词部分永远不在词典中,我认为此算法可能对您更好:
import enchant
words = ['love', 'friend', 'car', 'apple', 'com', 'puter', 'vi']
myit = iter(words)
dic = enchant.Dict('en_UK')
lst = []
# The word that you are currently considering
current = ''
for i in words:
# Add the next word
current += i
# If the current word is in the dictionary
if dic.check(current):
# Add it to the list
lst.append(current)
# Clear the current word
current = ''
# If the word is not in the dictionary we keep adding words to current
print(lst)
Notwithstanding the fact that this method is not very robust (you would miss "ham-burger", for example), the main error was that you didn't loop on the iterator, but on the list itself. 尽管该方法不是很健壮(例如,您可能会错过“汉堡”),但主要错误是您没有在迭代器上循环,而是在列表本身上循环。 Here is a corrected version.
这是更正的版本。
Note that I renamed the variables to give them more expressive names, and I replaced the dictionnary check by a simple word in dic
with a sample vocabulary - the module you import is not part of the standard library, which make your code as-is difficult to run for us who don't have it. 请注意,我重命名了变量以赋予它们更具表达性的名称,然后用示例词汇将
word in dic
Check中的简单word in dic
替换为word in dic
的简单word in dic
-导入的模块不是标准库的一部分,这使您的代码很难为没有它的我们奔跑。
dic = {'love', 'friend', 'car', 'apple',
'computer', 'banana'}
words=['love', 'friend', 'car', 'apple', 'com', 'puter', 'vi']
words_it = iter(words)
valid_words = []
for word in words_it:
if word in dic:
valid_words.append(word)
else:
try:
concacenated = word + next(words_it)
if concacenated in dic:
valid_words.append(concacenated)
except StopIteration:
pass
print (valid_words)
# ['love', 'friend', 'car', 'apple', 'computer']
You need the try ... except
part in case the last word of the list is not in the dictionnary, as next()
will raise a StopIteration
in this case. 您需要
try ... except
部分,以防列表的最后一个单词不在字典中,因为在这种情况next()
将引发StopIteration
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.