Create a new variable based on 4 other variables

Question

I have a dataframe in Python called df1 where I have 4 dichotomous variables called Ordering_1; Ordering_2, Ordering_3, Ordering_4 with True/False values.

I need to create a variable called Clean , which is based on the 4 other variables. Meaning, when Ordering_1 == True, then Clean == Ordering_1, when Ordering_2==True, then Clean == Ordering_2. Then Clean would be a combination of all the true values from Ordering_1; Ordering_2, Ordering_3, Ordering_4.

Here is an example of how I would like the variable Clean to be:

I have tried the below code but it does not work: df1[Clean] = df1[Ordering_1] + df1[Ordering_1] + df1[Ordering_1] + df1[Ordering_1]

Would anyone please be able to help me how to do this in python?

Answer 1

Universal solution if there are multiple True s per rows - filter columns by DataFrame.filter and then use DataFrame.dot for matrix multiplication:

df1 = df.filter(like='Ordering_')

df['Clean'] = df1.dot(df1.columns + ',').str.strip(',')

Answer 2

If there is only one "True" value per row you can use the booleans of each column "Ordering_1", "Ordering_2", etc. and the df1.loc.

Note that this is what you get with df1.Ordering_1: 0 True 1 False 2 False 3 False Name: Ordering_1, dtype: bool

With df1.loc you can use it to filter on the "True" rows, in this case only row 0:

So you can code this:

Create a new blank "clean" column: df1["clean"]=""
Set the rows where the series df.Ordering_1 = True to "Ordering_1":
df1.loc[df1.Ordering_1,["clean"]] = "Ordering_1"
Proceed with the remaining columns in the same way.

Create a new variable based on 4 other variables

Question

2 answers

solution1
1 2021-12-15 10:58:21

solution2
0 2021-12-15 11:11:45

Create a new variable based on 4 other variables

Question

2 answers

solution1 1 2021-12-15 10:58:21

solution2 0 2021-12-15 11:11:45

solution1
1 2021-12-15 10:58:21

solution2
0 2021-12-15 11:11:45