[英]Filter Dictionary According to Existing List
Still a Python novice so please go easy on me... 还是Python新手,请对我轻松一点...
I've got a dictionary set up: 我已经设置了字典:
new_dict
I'd like to filter to return the keys, where any of the values attached to each key match the value in an existing list I have set up: 我想过滤以返回键,每个键附加的任何值都与我已设置的现有列表中的值匹配:
list(data.Mapped_gene)
Any ideas? 有任何想法吗?
Edit: I still haven't been able to make this work. 编辑:我仍然无法使这项工作。
The csv tables and keys are all strings if that helps. 如果有帮助,csv表和键都是字符串。
Here is the full code to broaden understanding: 这是扩展理解的完整代码:
import csv
new_dict = {}
with open(raw_input("Enter csv file (including path)"), 'rb') as f:
reader = csv.reader(f)
for row in reader:
if row[0] in new_dict:
new_dict[row[0]].append(row[1:])
else:
new_dict[row[0]] = row[1:]
print new_dict
#modified from: http://bit.ly/1iOS7Gu
import pandas
colnames = ['Date Added to Catalog', 'PUBMEDID', 'First Author', 'Date', 'Journal', 'Link', 'Study', 'DT', 'Initial Sample Size', 'Replication Sample Size', 'Region', 'Chr_id', 'Chr_pos', 'Reported Gene(s)', 'Mapped_gene', 'p-Value', 'Pvalue_mlog', 'p-Value (text)', 'OR or beta', '95% CI (text)', 'Platform [SNPs passing QC]', 'CNV']
data = pandas.read_csv('C:\Users\Chris\Desktop\gwascatalog.csv', names=colnames)
my_list = list(data.Mapped_gene)
my_set = set(my_list)
[k for k, v in new_dict.items() if any(x in my_set for x in v)]
Error Message: "TypeError: unhashable type: 'list'" 错误消息:“ TypeError:不可散列的类型:'列表'”
Use any
and a list comprehension: 使用
any
和list理解:
my_list = list(data.Mapped_gene)
keys = [k for k, v in new_dict.items() if any(x in my_list for x in v)]
In case my_list
is huge then convert it to a set
first as it provides O(1)
lookup. 如果
my_list
很大,则首先将其转换为set
因为它提供了O(1)
查找。
geneset = set(data.Mapped_gene)
[k for k, v in new_dict.items() if geneset.intersection(v)]
To increase the performance of the lookup transform the list to a set. 为了提高查找性能,请将列表转换为一组。
gene_set = set(data.Mapped_gene)
Then use a list comprehension like shown in the other examples or a dictionary comprehension if you are interested in the value as well. 然后,使用列表理解(如其他示例中所示)或字典理解(如果您也对该值感兴趣)。
{k:v for k, v in my_dict.iteritems() if v in gene_set}
The method iteritems()
method on my_dict
is especially useful if my_dict
is huge. 该方法
iteritems()
的方法my_dict
是,如果特别有用my_dict
是巨大的。 In order to make your method more memory efficient you can use a generator instead of a list or dictionary comprehension: 为了使方法的内存使用效率更高,可以使用生成器而不是列表或字典理解:
(k for k, v in my_dict.iteritems() if v in gene_set)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.