简体   繁体   English

熊猫:根据条件过滤 df

[英]pandas: filter df based on condition

let's say I have a dataframe like假设我有一个像

A B 
11 2             # PASS 
22 4             # FAIL
33 5             # FAIL
44 4             # PASS

And two dicts like:和两个字典,如:

B_column_dct = {2: [2,3,5], 4: [33,22,121], 5: [1,2,3]}    # the dict key will have multiple values in a list
A_column_dct = {11: [3], 22: [4], 33: [5], 44: [22]}  # the dict key will always have a single value in a list

Now I want to filter the above dataframe, such that for every value in column A and B it should only be present in the df if: A_column_dct's value is present in B_column_dct's corresponding value.现在我想过滤上面的数据框,这样对于列 A 和 B 中的每个值,它应该只出现在 df 中,如果: A_column_dct 的值存在于 B_column_dct 的相应值中。

The final result df:最终结果df:

A B 
11 2            
44 4

      

Sorry to say but I cannot completely make sense of your values and the filtered df that you're trying to create, primarily given that a dict cannot hold duplicate keys (ie in the B-column of the org df, the value 4 will not work properly. I tried to get it to work anyway, thinking that they key 4 in the b_dict represents BOTH the column-b values, but then I didn't arrive at the same conclusion as you did in terms of the filtered df. Anyway, below is the code I've used (possibly the longest one-liner I've made so far, I would advice to re-write for readability):很抱歉,但我无法完全理解您的值和您尝试创建的过滤后的 df,主要是因为 dict 不能保存重复键(即在 org df 的 B 列中,值 4 不会正常工作。无论如何我试图让它工作,认为他们在 b_dict 中的键 4 代表了列 b 的值,但后来我没有得出与你在过滤后的 df 方面所做的相同的结论。无论如何,下面是我使用的代码(可能是我迄今为止制作的最长的单行代码,我建议重新编写以提高可读性):

flat_a = list(set().union(*A_column_dct.values()))
flat_b = list(set().union(*B_column_dct.values()))


filtering = [(any(elem_a in flat_b for elem_a in A_column_dct[i])) and (any(elem_b in flat_a for elem_b in B_column_dct[j])) for i, j in zip(org_df["A"], org_df["B"])]

filtered_df = org_df[filtering]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM