繁体   English   中英

从python字典中删除项目

[英]Deleting items from a python dictionary

我正在尝试在Python中使用垃圾邮件分类应用程序,但出现以下错误。 不过我不明白,因为我正在使用.keys方法从字典中删除项目,所以这应该不是问题吗? 我试过删除所有的functionars栏,dictionary函数,以试图找到原因,但是我似乎无法绕过这个问题

Python代码

    import os
    import numpy as np
    from collections import Counter
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.svm import LinearSVC
    from sklearn.metrics import confusion_matrix

    def make_Dictionary(train_dir):
        emails = [os.path.join(train_dir,f) for f in os.listdir(train_dir)]    
        all_words = []       
        for mail in emails:    
            with open(mail) as m:
                for i,line in enumerate(m):
                    if i == 2:
                        words = line.split()
                        all_words += words

        dictionary = Counter(all_words)

        list_to_remove = dictionary.keys()
        for item in list_to_remove:
            if item.isalpha() == False: 
                del dictionary[item]
            elif len(item) == 1:
                del dictionary[item]
        dictionary = dictionary.most_common(3000)
        return dictionary

    def extract_features(mail_dir): 
        files = [os.path.join(mail_dir,fi) for fi in os.listdir(mail_dir)]
        features_matrix = np.zeros((len(files),3000))
        docID = 0;
        for fil in files:
          with open(fil) as fi:
            for i,line in enumerate(fi):
              if i == 2:
                words = line.split()
                for word in words:
                  wordID = 0
                  for i,d in enumerate(dictionary):
                    if d[0] == word:
                      wordID = i
                      features_matrix[docID,wordID] = words.count(word)
            docID = docID + 1     
        return features_matrix

    # Create a dictionary of words with its frequency

    train_dir = r'.\train-mails'
    dictionary = make_Dictionary(train_dir)

    # Prepare feature vectors per training mail and its labels

    train_labels = np.zeros(702)
    train_labels[351:701] = 1
    train_matrix = extract_features(train_dir)

    # Training SVM and Naive bayes classifier and its variants

    model1 = LinearSVC()


    model1.fit(train_matrix,train_labels)


    # Test the unseen mails for Spam

    test_dir = r'.\test-mails'
    test_matrix = extract_features(test_dir)
    test_labels = np.zeros(260)
    test_labels[130:260] = 1

    result1 = model1.predict(test_matrix)


    print (confusion_matrix(test_labels,result1))
    print (confusion_matrix(test_labels,result2))

错误

RuntimeError: dictionary changed size during iteration

这在Python 3.x中不起作用,因为keys返回迭代器而不是列表。

另一种方法是使用列表来强制复制密钥。 这也可以在Python 3.x中使用:

for i in list(list_to_remove):

dictionary.keys()实际上返回对原始字典键的引用。

您可以通过以下操作进行检查:

 a_dict = {'a': 1}
 keys = a_dict.keys() # keys is dict_keys(['a'])
 a_dict['b'] = 2 # keys is dict_keys(['a', 'b'])

这就是为什么出现错误的原因:使用del dictionary[item]实际上会影响list_to_remove ,这在循环期间是被禁止的。

您可以通过在循环遍历原始密钥之前创建副本来避免这种情况。 实现此目的的最简单方法是使用list构造函数。 所以改变你的路线

list_to_remove = dictionary.keys()

有:

list_to_remove = list(dictionary.keys())

解决了这个问题。

评论后的版本

请注意,此行为仅在python 3发生,在python 2.keys()方法返回一个普通列表,但未引用字典:

a_dict = {'a': 1}
keys = a_dict.keys() # keys is ['a']
a_dict['b'] = 2 # keys is still ['a']

关于此内容,请参见 Python 3.0 changelog

一些著名的API不再返回列表:

  • dict方法dict.keys(),dict.items()和dict.values()返回“视图”而不是列表。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM