我的清单中包含重复的元素，我不知道为什么

Question

I have a list of strings in a list called "texts". 我在名为“文本”的列表中有一个字符串列表。 I am trying to scan each string for each of the words in a list called "key_words". 我正在尝试扫描名为“ key_words”的列表中每个单词的每个字符串。 If any of the key words is in the string, it goes into "list1". 如果字符串中有任何关键字，它将进入“ list1”。 If none of the key words is in the string, it goes into "list2". 如果字符串中没有关键字，它将进入“ list2”。 My goal is for each string to be in its appropriate list once. 我的目标是使每个字符串一次出现在其相应的列表中。 The problem is that because I have three words in "key_words", a string with any of the words will go into list1 three times. 问题是，因为我在“ key_words”中有三个单词，所以包含任何单词的字符串都会进入list1 3次。 I don't know why this is happening and I've been stuck working on this for an hour even though this seems pretty simple. 我不知道为什么会这样，尽管看起来很简单，但我一直坚持工作一个小时。 Any help greatly appreciated. 任何帮助，不胜感激。

I have a list of strings in a list called "texts". 我在名为“文本”的列表中有一个字符串列表。

list1 = []
list2 = []
key_words = ['must', 'should', 'wish']

for text in texts:

    for word in key_words:

        if text not in list1 and text not in list2:

            if word in text:
                 list1.append(text)

            else:
                list2.append(text)

Answer 1

Firstly, your code has a bug: 首先，您的代码有一个错误：

If any of the keywords are in text , it should go to list1 如果任何关键字在text ，则应转到list1

However in your code, you immediately copy the text to list2 , even if the first keyword is not present. 但是，在代码中，即使第一个关键字不存在，也应立即将文本复制到list2 。 The trick to solve this simply is in the quote above. 简单地解决这个问题的技巧就在上面的引用中。 Here is a simple and efficient solution: 这是一个简单而有效的解决方案：

import re

keyword_regex = '|'.join(key_words)  # Compile the regex if you have to use many times

for text in texts:
    if re.search(keyword_regex, text):  # Success if any keyword is in text
        list1.append(text)
    else:
        list2.append(text)

Answer 2

When you are looping over the keywords you are adding the text to lists multiple times. 遍历关键字时，您多次将文本添加到列表中。

list1 = []
list2 = []
key_words = ['must', 'should', 'wish']
texts = ["must the a hooray", "hooray should the a", "a the an"]

for text in texts:

    found = False
    if text not in list1 and text not in list2:
        for word in key_words:

                if word in text:
                     found = True
                     break
        if found:
            list1.append(text)
        else:
            list2.append(text)

print(list1)
print(list2)

Generates: 产生：

['must the a hooray', 'hooray should the a'] ['必须是万岁，'万岁应该是']

['a the an'] ['a an an']

Answer 3

It will scan through the entire text document and add the "text" word to a respective list if the "text" word has not been previously inserted. 如果先前未插入“文本”字词，它将扫描整个文本文档并将“文本”字词添加到相应的列表中。

list1 = []
list2 = []
key_words = ['must', 'should', 'wish']

for text in texts:
    for word in key_words:
       if (word in text and not in list1):
          list1.append(text)
       elif (word not in list2):
          list2.append(text)

Answer 4

You need to scan the words in the text , not in the key_words list. 您需要扫描text的单词，而不是key_words列表中的单词。 The latter is just used to check the condition and decide for list1 or list2 . 后者仅用于检查条件并确定list1或list2 。

This is an option using re.findall library for splitting the text in words, without punctuation . 这是一个使用re.findall库的选项，用于将文本拆分为单词， 而不使用标点符号 。 Once you have the list of words, you can iterate over it and check if each word in in key_words . 有了单词列表后，就可以对其进行遍历，并检查是否每个单词都在key_words 。

In the following example I'm using just one text string, you can extend the code for a list of texts. 在下面的示例中，我仅使用一个文本字符串，您可以将代码扩展为文本列表。

This is what happen to text when applying the re.findal l method: 应用re.findal l方法时， text将发生以下情况：

text = 'Must the show go on? I wish, it should! It must.'
print(re.findall(r'\w+',text))
#=> ['Must', 'the', 'show', 'go', 'on', 'I', 'wish', 'it', 'should', 'It', 'must']

The lookup on the text is performed once as the loop starts, here the complete code: 循环开始后，将对文本执行一次查找，此处是完整的代码：

for txt_word in re.findall(r'\w+',text):
  if txt_word.lower() in key_words: # <- note .lower()
    list1.append(txt_word) # just add if not in list1 if you don't want duplicates
  else:
    list2.append(txt_word)

This is the output: 这是输出：

print(list1) #=>['Must', 'wish', 'should', 'must']
print(list2) #=> ['the', 'show', 'go', 'on', 'I', 'it', 'It']

我的清单中包含重复的元素，我不知道为什么

问题描述

4 个解决方案

解决方案1
2 2019-01-26 06:02:06

解决方案2
1 2019-01-26 06:00:16

解决方案3
0 2019-01-26 08:49:37

解决方案4
0 2019-01-26 11:04:34

我的清单中包含重复的元素，我不知道为什么

问题描述

4 个解决方案

解决方案1 2 2019-01-26 06:02:06

解决方案2 1 2019-01-26 06:00:16

解决方案3 0 2019-01-26 08:49:37

解决方案4 0 2019-01-26 11:04:34

解决方案1
2 2019-01-26 06:02:06

解决方案2
1 2019-01-26 06:00:16

解决方案3
0 2019-01-26 08:49:37

解决方案4
0 2019-01-26 11:04:34