Conditions based on multiple column values

Question

I have a dataframe that looks like this.

I want to add a column 'Correct_entry' that returns 'True' or 'False' based on the combination of values in the current columns. In total I have 24 possible combinations that are correct. I give two examples:

1.

If df['Hazard_type'] == 'Drought' & df['Card_type'] == 'Red block' & If['Round'] == 1 & df['Scenario'] == 'scenario A' & df['Payment_type'] == 'One payment (lump sum)'

If df['Hazard_type'] == 'Drought' & df['Card_type'] == 'Green block' & If['Round'] == 1 & df['Scenario'] == 'scenario A' & df['Payment_type'] == 'One payment (lump sum)'

I have 24 different combinations of values that are all correct and should equal 'True'. All other combinations should equal to 'False'.

What would be the best way to go through this set of data? How can I combine all these different statements?

I hope it is clear.

edit: As requested, the data in text format.

    Hazard_type  Card_type   Round  Scenario    Payment_type
244 Drought      Green block    2   scenario B  Two payments (two consecutive sums)
643 Drought      Red block      4   scenario A  Two payments (two consecutive sums)
584 Drought      Red block      4   scenario A  One payment (lump sum)
242 Drought      Red block      2   scenario B  Two payments (two consecutive sums)
1039 Drought     Green block    6   scenario A  Two payments (two consecutive sums)
101 Flood        Red block      1   scenario A  Two payments (two consecutive sums)

Answer 1

generate data to match your DF
use np.select() to return the matched condition or np.nan for transparency
then simple to get to bool

s = 200
df = pd.DataFrame({"Hazard_type":np.random.choice(["Drought","Flood"],s),
             "Card_type":np.random.choice(["Red block","Green block"],s),
             "Round":np.random.randint(1,7,s),
             "Scenario":np.random.choice(["scenario A","scenario B"],s),
             "Payment_type":np.random.choice(["One payment (lump sum)","Two payments"],s)})

conditions = [
    # condition 0
((df['Hazard_type'] == 'Drought') & (df['Card_type'] == 'Red block') & (df['Round'] == 1) 
 & (df['Scenario'] == 'scenario A') & (df['Payment_type'] == 'One payment (lump sum)')
),    
    # condition 1
((df['Hazard_type'] == 'Drought') & (df['Card_type'] == 'Green block') & (df['Round'] == 1) & 
 (df['Scenario'] == 'scenario A') & (df['Payment_type'] == 'One payment (lump sum)')
)]

df = df.assign(Correct_case=np.select(conditions, [c for c in range(len(conditions))], np.nan),
          Correct_entry=lambda dfa: ~dfa.Correct_case.isna())

sample output of Correct_entry==True

	Hazard_type	Card_type	Round	Scenario	Payment_type	Correct_case	Correct_entry
47	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
50	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
72	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
104	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
120	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
170	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
186	Drought	Green block	1	scenario A	One payment (lump sum)	1	True

Answer 2

I would approach this by creating a series of boolean masks depending on the scenerio

# boolean masks
drought_mask = df['Hazard_type'] == 'Drought'
red_block_mask = df['Card_type'] == 'Red block'
green_block_mask = df['Card_type'] == 'Green block'
round_1_mask = df['Round'] == 1
scenario_a_mask = df['Scenario'] == 'scenario A'
one_payment_mask = df['Payment_type'] == 'One payment (lump sum)'


# scenerio 1
scenario_1_mask = df.loc[(drought_mask & red_block_mask & scenario_a_mask) & (round_1_mask & one_payment_mask & one_payment_mask)]
# scenario 2
scenario_2_mask = df.loc[(drought_mask & green_block_mask & scenario_a_mask) & (round_1_mask & one_payment_mask & one_payment_mask)]

# combining scenerios
df['Correct_entry'] = df.loc[scenario_1_mask | scenario_2_mask]

You can create a mask for every scenario and then combine them in the final correct_entry column using an OR operator (|). This should return a True value for every row which is matches one of the 24 scenarios.

Conditions based on multiple column values

Question

2 answers

solution1
1 ACCPTED 2021-02-12 15:59:23

sample output of Correct_entry==True

solution2
0 2021-02-12 15:47:30

Conditions based on multiple column values

Question

2 answers

solution1 1 ACCPTED 2021-02-12 15:59:23

sample output of Correct_entry==True

solution2 0 2021-02-12 15:47:30

solution1
1 ACCPTED 2021-02-12 15:59:23

solution2
0 2021-02-12 15:47:30