简体   繁体   English

如何从字典中的列表中删除所有非数值?

[英]How to remove all non numerical values from a list in dictionary?

I hope the question wasn't too confusing.我希望这个问题不要太混乱。 I have made a little webscraper that scrapes a website for real estate informaton (price, size (square meters), number of bedrooms etc.).我制作了一个小网页爬虫,可以抓取一个网站以获取房地产信息(价格、大小(平方米)、卧室数量等)。 I have stored these pieces of information in a dictionary of lists, where each index represents each real estate listing, like this:我将这些信息存储在列表字典中,其中每个索引代表每个房地产列表,如下所示:

info_dict = {prices: ["1200000", "1400000", "1000000", "-"], sizes = ["120", "140", "90", "100"], bedrooms = ["2", "3", "2", "1"]}

My problem is that I'm going to do analysis on this information, ie price per square meter etc. And some of my values are not formatted right, like index 3 in info_dict["prices"].我的问题是我要对这些信息进行分析,即每平方米的价格等。我的一些值的格式不正确,例如 info_dict["prices"] 中的索引 3。 For non numerical values like this I want to remove this value from the dictionary as well as the other values for this index (sizes, bedrooms).对于像这样的非数值,我想从字典中删除这个值以及这个索引的其他值(大小、卧室)。 Any ideas on how I can accomplish this?关于我如何做到这一点的任何想法?

You could get the valid triplets and then build another dictionary of use them directly.您可以获得有效的三元组,然后直接构建另一个使用它们的字典。

>>> res = [(x,y,z) for x,y,z in zip(*info_dict.values()) if x.isdigit()]
>>> res
[('1200000', '120', '2'), ('1400000', '140', '3'), ('1000000', '90', '2')]

There are bunch of different ways.有很多不同的方式。 One of the approaches is to use regular expressions.其中一种方法是使用正则表达式。

import re

digit_pattern = re.compile('\d+')

info_dict = {"prices" : ["1200000", "1400000", "1000000", "-"], "sizes":["120", "140", "90", "100"], "bedrooms": ["2", "3", "2", "1"]}

pattern_info_dict = {key:[rec for rec in value if digit_pattern.match(str(rec))] for key, value in info_dict.items()}

pattern_info_dict
{'prices': ['1200000', '1400000', '1000000'],
 'sizes': ['120', '140', '90', '100'],
 'bedrooms': ['2', '3', '2', '1']}

If in future the pattern changes, you would just need to change the pattern and the code should still work fine.如果将来模式发生变化,您只需要更改模式并且代码仍然可以正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM