根据现有列表过滤字典

Question

Still a Python novice so please go easy on me... 还是Python新手，请对我轻松一点...

I've got a dictionary set up: 我已经设置了字典：

new_dict

I'd like to filter to return the keys, where any of the values attached to each key match the value in an existing list I have set up: 我想过滤以返回键，每个键附加的任何值都与我已设置的现有列表中的值匹配：

list(data.Mapped_gene)

Any ideas? 有任何想法吗？

Edit: I still haven't been able to make this work. 编辑：我仍然无法使这项工作。

The csv tables and keys are all strings if that helps. 如果有帮助，csv表和键都是字符串。

Here is the full code to broaden understanding: 这是扩展理解的完整代码：

import csv    
new_dict = {}
with open(raw_input("Enter csv file (including path)"), 'rb') as f:
  reader = csv.reader(f)
  for row in reader:
    if row[0] in new_dict:
      new_dict[row[0]].append(row[1:])
    else:
      new_dict[row[0]] = row[1:]
print new_dict

#modified from: http://bit.ly/1iOS7Gu
import pandas
colnames = ['Date Added to Catalog',    'PUBMEDID', 'First Author', 'Date',     'Journal',  'Link', 'Study',    'DT',   'Initial Sample Size',  'Replication Sample Size',  'Region',   'Chr_id',   'Chr_pos',  'Reported Gene(s)', 'Mapped_gene',  'p-Value',  'Pvalue_mlog',  'p-Value (text)',   'OR or beta',   '95% CI (text)',    'Platform [SNPs passing QC]',   'CNV']
data = pandas.read_csv('C:\Users\Chris\Desktop\gwascatalog.csv', names=colnames)


my_list = list(data.Mapped_gene)
my_set = set(my_list)

[k for k, v in new_dict.items() if any(x in my_set for x in v)]

Error Message: "TypeError: unhashable type: 'list'" 错误消息：“ TypeError：不可散列的类型：'列表'”

Answer 1

Use any and a list comprehension: 使用any和list理解：

my_list = list(data.Mapped_gene)
keys = [k for k, v in new_dict.items() if any(x in my_list for x in v)]

In case my_list is huge then convert it to a set first as it provides O(1) lookup. 如果my_list很大，则首先将其转换为set因为它提供了O(1)查找。

Answer 2

geneset = set(data.Mapped_gene)
[k for k, v in new_dict.items() if geneset.intersection(v)]

Answer 3

To increase the performance of the lookup transform the list to a set. 为了提高查找性能，请将列表转换为一组。

gene_set = set(data.Mapped_gene)

Then use a list comprehension like shown in the other examples or a dictionary comprehension if you are interested in the value as well. 然后，使用列表理解（如其他示例中所示）或字典理解（如果您也对该值感兴趣）。

{k:v for k, v in my_dict.iteritems() if v in gene_set}

The method iteritems() method on my_dict is especially useful if my_dict is huge. 该方法iteritems()的方法my_dict是，如果特别有用my_dict是巨大的。 In order to make your method more memory efficient you can use a generator instead of a list or dictionary comprehension: 为了使方法的内存使用效率更高，可以使用生成器而不是列表或字典理解：

(k for k, v in my_dict.iteritems() if v in gene_set)

根据现有列表过滤字典

问题描述

3 个解决方案

解决方案1
3 2014-02-12 16:13:26

解决方案2
2 2014-02-12 16:15:55

解决方案3
0 2014-02-12 16:15:54

根据现有列表过滤字典

问题描述

3 个解决方案

解决方案1 3 2014-02-12 16:13:26

解决方案2 2 2014-02-12 16:15:55

解决方案3 0 2014-02-12 16:15:54

解决方案1
3 2014-02-12 16:13:26

解决方案2
2 2014-02-12 16:15:55

解决方案3
0 2014-02-12 16:15:54