简体   繁体   English

包含列表数据的字典,基于列表中的值进行过滤

[英]Dictionary Containing list data, filter based on value in list

I have test data which is gathered based on multiple inputs, and results in a single output. 我有基于多个输入收集的测试数据,并得出一个输出。 I'm currently storing this data in a dictionary whose keys are my parameter/ results labels, and whose values are the test conditions and results. 我目前正在将此数据存储在字典中,该字典的键是我的参数/结果标签,其值是测试条件和结果。 I would like to be able to filter the data so I can generate plots based on isolated conditions. 我希望能够过滤数据,以便可以基于孤立的条件生成图。

In my example below, my test conditions would be 'a' and 'b', and the result of the experiment would be 'c'. 在下面的示例中,我的测试条件为“ a”和“ b”,而实验结果为“ c”。 I want to filter my data so I get a dictionary with the same key, value structure and only my filtered results. 我想过滤我的数据,所以我得到一个具有相同键,值结构和仅过滤结果的字典。 However my current dictionary comprehension returns an empty dictionary. 但是我当前的字典理解返回一个空字典。 Any advice to get the desired result? 有什么建议可以得到理想的结果吗?

Current Code: 当前代码:

data = {'a': [0, 1, 2, 0, 1, 2], 'b': [10, 10, 10, 20, 20, 20], 'c': [1.3, 1.9, 2.3, 2.3, 2.9, 3.4]}
filtered_data = {k:v for k,v in data.iteritems() if v in data['b'] >= 20}

Desired Result: 预期结果:

{'a': [0, 1, 2], 'b': [20, 20, 20], 'c': [2.3, 2.9, 3.4]}

Current Result: 当前结果:

{}

Also, is this dictionary of lists a good schema to store data of this type, given that I'm going to want to filter the results, or is there a better way to accomplish this? 另外,考虑到我要过滤结果,此列表字典是否是存储这种类型数据的良好模式,还是有更好的方法来实现此目的?

Consider using the pandas module for this type of work. 考虑将pandas模块用于此类工作。

import pandas as pd
df = pd.DataFrame(data)
df = df[df["b"] >= 20]
print(df)

It appears like this will give you what you want. 看来这样会给您您想要的。 You are using the dictionary key to represent the column name and the values are just rows in a given column, so it is amenable to using a dataframe. 您正在使用字典键来表示列名,并且值只是给定列中的行,因此可以使用数据框。

Result: 结果:

   a   b    c
3  0  20  2.3
4  1  20  2.9
5  2  20  3.4

use this: 用这个:

k:[v[i] for i,x in enumerate(v) if data['b'][i] >= 20] for k,v in data.items()}

Desired Result: 所需结果:

{'a': [0, 1, 2], 'c': [2.3, 2.9, 3.4], 'b': [20, 20, 20]}

Are all of the dictionary value lists in matching orders? 是否所有字典值列表都按匹配顺序排列? If so, you could just look at whichever list you want to filter by, say 'b' in this case, find the values you want, and then either use those indices or the same slice on the other values in the dictionary. 如果是这样,您可以仅查看要过滤的列表,在这种情况下说'b' ,找到所需的值,然后在索引中的其他值上使用这些索引或同一切片。

For example: 例如:

matching_indices = []
for i in data['b']:
    if data['b'][i] >= 20:
        matching_indices.append(i)
new_dict = {}
for key in data:
    for item in matching_indices:
        new_dict[key] = data[key][item]

You could probably figure a dictionary comprehension for it if you wanted. 如果您愿意,您可能会想出一个字典理解的方法。 Hopefully this is clear. 希望这很清楚。

you can change this into a method which would give it more flexibility. 您可以将其更改为可以提供更大灵活性的方法。 Your current logic means that dataset a and c are neglected because there are no values greater than or equal to 20: 您当前的逻辑意味着忽略数据集a和c,因为没有大于或等于20的值:

data = {'a': [0, 1, 2, 0, 1, 2], 'b': [10, 10, 10, 20, 20, 20], 'c': [1.3, 1.9, 2.3, 2.3, 2.9, 3.4]}
filter_vals = ['a', 'b']
new_d = {}
for k, v in data.iteritems():
  if k in filter_vals:
    new_d[k] = [i for i in v if i >= 20]

print new_d

Now i'm not a big fan if many if statements, but something like this is straight forward and can be called many times 现在我不是很多if语句的忠实拥护者,但是像这样的事情很简单,可以被多次调用

def my_filter(operator, condition, filter_vals, my_dict):
  new_d = {}
  for k, v in my_dict.iteritems():
    if k in filter_vals:
      if operator == '>':
        new_d[k] = [i for i in v if i > condition]
      elif operator == '<':
        new_d[k] = [i for i in v if i < condition]
      elif operator == '<=':
        new_d[k] = [i for i in v if i <= condition]
      elif operator == '>=':
        new_d[k] = [i for i in v if i >= condition]
  return new_d

I agree with the pandas approach above. 我同意以上的熊猫方法。

If for some reason you hate pandas or are an old school computer scientist, tuples are a good way to tore relational data. 如果由于某种原因您讨厌熊猫或是一名老式计算机科学家,则元组是撕毁关系数据的好方法。 In your example, the a, b, and c lists are columns rather than rows. 在您的示例中,a,b和c列表是列而不是行。 For tuples, you would want to store the rows as: 对于元组,您希望将行存储为:

data = {'a':(0,10,1.3),'b':(1,10,1.9),'c':(2,10,2.3),'d':(0,20,2.3),'e':(1,20,2.9),'f':(2,20,3.4)}

where the tuples are stored in the (condition1, condition2, outcome) format you described and you can call a single test or filter a set as you describe. 元组以您描述的(condition1,condition2,result)格式存储的位置,您可以调用单个测试或按描述过滤集合。 From there you can get a filtered set of results as follows: 从那里可以得到一组经过过滤的结果,如下所示:

filtered_data = {k:v for k,v in data.iteritems() if v[1]>=20}

which returns: 返回:

{'d': (0, 20, 2.3), 'e': (1, 20, 2.9), 'f': (2, 20, 3.4)}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM