pandas: filter df based on condition

Question

let's say I have a dataframe like

A B 
11 2             # PASS 
22 4             # FAIL
33 5             # FAIL
44 4             # PASS

And two dicts like:

B_column_dct = {2: [2,3,5], 4: [33,22,121], 5: [1,2,3]}    # the dict key will have multiple values in a list
A_column_dct = {11: [3], 22: [4], 33: [5], 44: [22]}  # the dict key will always have a single value in a list

Now I want to filter the above dataframe, such that for every value in column A and B it should only be present in the df if: A_column_dct's value is present in B_column_dct's corresponding value.

The final result df:

A B 
11 2            
44 4

Answer 1

Sorry to say but I cannot completely make sense of your values and the filtered df that you're trying to create, primarily given that a dict cannot hold duplicate keys (ie in the B-column of the org df, the value 4 will not work properly. I tried to get it to work anyway, thinking that they key 4 in the b_dict represents BOTH the column-b values, but then I didn't arrive at the same conclusion as you did in terms of the filtered df. Anyway, below is the code I've used (possibly the longest one-liner I've made so far, I would advice to re-write for readability):

flat_a = list(set().union(*A_column_dct.values()))
flat_b = list(set().union(*B_column_dct.values()))


filtering = [(any(elem_a in flat_b for elem_a in A_column_dct[i])) and (any(elem_b in flat_a for elem_b in B_column_dct[j])) for i, j in zip(org_df["A"], org_df["B"])]

filtered_df = org_df[filtering]

pandas: filter df based on condition

Question

1 answers

solution1
0 2020-10-27 12:25:50

pandas: filter df based on condition

Question

1 answers

solution1 0 2020-10-27 12:25:50

solution1
0 2020-10-27 12:25:50