简体   繁体   中英

How can I create a dummy variable in Python with a condition below or above median?

How can I create binary dummy variables in Python which takes the value of 0 when a person's salary is below the median salary level and is set to 1 otherwise? I don't understand how to make it when salary above or below.

I tried this

df['Salary'] = (df['Salary'] > df['Salary'].median()) & (df['Salary'] < df['Salary'].median())

But there is no output.

Before that I tried this:

df['Salary'].median()
df_Salary = pd.get_dummies(df['Salary'].median())
df_new = pd.concat([df, df_Salary], axis=1)
df_new

And got this

    Gender  Exp Salary  74000.0

0   Female  15  78200   1
1   Female  12  66400   NaN
2   Female  3   6000    NaN
...

您可以通过将布尔值乘以一个整数来将其强制为整数:

df["Median_Compare"] = (df["Salary"] >= df["Salary"].median()) * 1

You can do a vectorized comparison and convert the result to an int:

>>> df["Median_Compare"] = (df["Salary"] >= df["Salary"].median()).astype(int)
>>> df
   Gender  Exp  Salary  Median_Compare
0  Female   15   78200               1
1  Female   12   66400               0
2  Female    3    6000               0

This works because we have

>>> df["Salary"].median()
66400.0
>>> df["Salary"] >= df["Salary"].median()
0     True
1    False
2    False
Name: Salary, dtype: bool
>>> (df["Salary"] >= df["Salary"].median()).astype(int)
0    1
1    0
2    0
Name: Salary, dtype: int32

To make the ternary approaches work (X if (condition) else Y), you'd need to apply it because they don't play nicely with arrays, which don't have an unambiguous truth value.

I think you want something like this (using your notation and variable names).

df['Salary'] = 0 if df['Salary'] < df['Salary'].median() else 1

This works exactly like it reads. It says df['Salary'] will be zero if the salary is less than the median, otherwise make it one. For reference, this type of statement is known as a ternary operator .

This is just using a basic conditional and storing the variable.

median = 30500
salary = 50000
median_flag = 1 if salary > median else 0
print median_flag
1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM