简体   繁体   中英

pandas: filter df based on condition

let's say I have a dataframe like

A B 
11 2             # PASS 
22 4             # FAIL
33 5             # FAIL
44 4             # PASS

And two dicts like:

B_column_dct = {2: [2,3,5], 4: [33,22,121], 5: [1,2,3]}    # the dict key will have multiple values in a list
A_column_dct = {11: [3], 22: [4], 33: [5], 44: [22]}  # the dict key will always have a single value in a list

Now I want to filter the above dataframe, such that for every value in column A and B it should only be present in the df if: A_column_dct's value is present in B_column_dct's corresponding value.

The final result df:

A B 
11 2            
44 4

      

Sorry to say but I cannot completely make sense of your values and the filtered df that you're trying to create, primarily given that a dict cannot hold duplicate keys (ie in the B-column of the org df, the value 4 will not work properly. I tried to get it to work anyway, thinking that they key 4 in the b_dict represents BOTH the column-b values, but then I didn't arrive at the same conclusion as you did in terms of the filtered df. Anyway, below is the code I've used (possibly the longest one-liner I've made so far, I would advice to re-write for readability):

flat_a = list(set().union(*A_column_dct.values()))
flat_b = list(set().union(*B_column_dct.values()))


filtering = [(any(elem_a in flat_b for elem_a in A_column_dct[i])) and (any(elem_b in flat_a for elem_b in B_column_dct[j])) for i, j in zip(org_df["A"], org_df["B"])]

filtered_df = org_df[filtering]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM