简体   繁体   English

Python KeyError字典

[英]Python KeyError Dictionary

  • I have a list, called tweets_data 我有一个列表,称为tweets_data
  • Each element of the list is a dictionary 列表的每个元素都是字典
  • Keys of the dictionary is 'text' 字典的键是“文本”
  • But the raw data has some missing 'text' 但是原始数据缺少一些“文本”

That is why, I want to remove the dictionaries having missing text. 因此,我要删除缺少文本的字典。 This is how my code looks like: 这是我的代码的样子:

for i in range(len(tweets_data)):
    try:
        print tweets_data[i]['text']
    except KeyError:
        tweets_data.remove(tweets_data[i])
        i += 1

And I am receiving such an error: 我收到这样的错误:

IndexError: list index out of range

My question: Is it possible to just remove the missing data from my list with a more clever way so that I won't get such an error? 我的问题:是否可以以更聪明的方式从列表中删除丢失的数据,以免出现此类错误? Thanks ahead! 谢谢你!

You can't remove items from a list while you're iterating over it without confusing the indexes. 遍历列表时,请勿在不混淆索引的情况下从列表中删除项目。 Each time you remove, the list gets shorter - but you're still counting up to the length of the original list and expecting to find elements there. 每次删除时,列表都会变短-但您仍在计算原始列表的长度,并期望在那里找到元素。

Try this instead: 尝试以下方法:

ok_tweets = [x for x in tweets_data if 'text' in x]

可能适合您采用其他方法

new_tweet_data = [tweet for tweet in tweet_data if 'text' in tweet]

I guess this one works... 我想这是可行的...

cleandata=[]
for i in range(len(tweets_data)):
    try:
        print tweets_data[i]['text']
        cleandata.append(tweets_data[i]['text'])
    except KeyError:
        i += 1

If your data is of reasonable size I'd recommend a filtered list comprehension, as previously suggested by others 如果您的数据大小合理,我建议您像其他人之前建议的那样,对列表进行过滤

 
 
 
 
  
  
  filtered = [tweet for tweet in tweets_data if 'text' in tweet]
 
 
  

OTOH, if your list is LARGE and the defective items, those you want to remove, are just a few, it is possible that an approach based on .remove() may be faster, avoiding the intermediate step of creating a LARGE new list OTOH,如果您的列表很大,而您要删除的有缺陷的项目很少,那么基于 .remove()可能会更快,从而避免了创建大的新列表的中间步骤

 
 
 
 
  
  
  delenda = [defective for defective in tweet_data if 'text' not in defective] for tweet in delenda: tweeets_data.remove(tweet)
 
 
  

Beware that each .remove() has to scan the whole list, so this approach could be competitive only for a very small ratio of items to remove 请注意,每个 .remove()都必须扫描整个列表,因此此方法仅对极少数要删除的项目具有竞争力

If you need to deliver a product based on this question I heartily recommend timing the different approaches with samples of your data 如果您需要根据此问题交付产品,我衷心建议您使用数据样本来计时不同的方法

Having read https://wiki.python.org/moin/TimeComplexity , in species 在物种中阅读了https://wiki.python.org/moin/TimeComplexity

Internally, a list is represented as an array; 在内部,列表表示为数组。 the largest costs come from growing beyond the current allocation size (because everything must move), or from inserting or deleting somewhere near the beginning (because everything after that must move). 最大的成本来自超出当前分配大小的范围(因为所有内容都必须移动),或者来自在开始处附近插入或删除某处(因为之后的所有内容都必须移动)。

I have striked out my previous answer, suggesting the use of .remove() to avoid copying a possibly LARGE list, because it turns out that every .remove() is, in effect, COPYING a possibly large part of the list. 我删除了我以前的答案,建议使用.remove()避免复制可能的大列表,因为事实证明,每个.remove()实际上都是在复制列表的很大一部分。

The right thing to do is indeed a list comprehension. 正确的做法确实是列表理解。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM