简体   繁体   中英

Create a new variable based on 4 other variables

I have a dataframe in Python called df1 where I have 4 dichotomous variables called Ordering_1; Ordering_2, Ordering_3, Ordering_4 with True/False values.

I need to create a variable called Clean , which is based on the 4 other variables. Meaning, when Ordering_1 == True, then Clean == Ordering_1, when Ordering_2==True, then Clean == Ordering_2. Then Clean would be a combination of all the true values from Ordering_1; Ordering_2, Ordering_3, Ordering_4.

Here is an example of how I would like the variable Clean to be:

I have tried the below code but it does not work: df1[Clean] = df1[Ordering_1] + df1[Ordering_1] + df1[Ordering_1] + df1[Ordering_1]

Would anyone please be able to help me how to do this in python?

Universal solution if there are multiple True s per rows - filter columns by DataFrame.filter and then use DataFrame.dot for matrix multiplication:

df1 = df.filter(like='Ordering_')

df['Clean'] = df1.dot(df1.columns + ',').str.strip(',')

If there is only one "True" value per row you can use the booleans of each column "Ordering_1", "Ordering_2", etc. and the df1.loc.

Note that this is what you get with df1.Ordering_1: 0 True 1 False 2 False 3 False Name: Ordering_1, dtype: bool

With df1.loc you can use it to filter on the "True" rows, in this case only row 0: 在此处输入图像描述

So you can code this:

  1. Create a new blank "clean" column: df1["clean"]=""

  2. Set the rows where the series df.Ordering_1 = True to "Ordering_1":
    df1.loc[df1.Ordering_1,["clean"]] = "Ordering_1"
    在此处输入图像描述

  3. Proceed with the remaining columns in the same way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM