[英]How can I create a dummy variable in Python with a condition below or above median?
How can I create binary dummy variables in Python which takes the value of 0
when a person's salary is below the median salary level and is set to 1
otherwise? 我如何在Python中创建二进制虚拟变量,当一个人的工资低于工资中位数,而该值设置为
1
时,该变量的值为0
? I don't understand how to make it when salary above or below. 当工资高于或低于工资时,我不知道如何做到。
I tried this 我试过了
df['Salary'] = (df['Salary'] > df['Salary'].median()) & (df['Salary'] < df['Salary'].median())
But there is no output. 但是没有输出。
Before that I tried this: 在此之前,我尝试过此操作:
df['Salary'].median()
df_Salary = pd.get_dummies(df['Salary'].median())
df_new = pd.concat([df, df_Salary], axis=1)
df_new
And got this 并得到这个
Gender Exp Salary 74000.0
0 Female 15 78200 1
1 Female 12 66400 NaN
2 Female 3 6000 NaN
...
您可以通过将布尔值乘以一个整数来将其强制为整数:
df["Median_Compare"] = (df["Salary"] >= df["Salary"].median()) * 1
You can do a vectorized comparison and convert the result to an int: 您可以进行向量化比较,然后将结果转换为int:
>>> df["Median_Compare"] = (df["Salary"] >= df["Salary"].median()).astype(int)
>>> df
Gender Exp Salary Median_Compare
0 Female 15 78200 1
1 Female 12 66400 0
2 Female 3 6000 0
This works because we have 这行得通,因为我们有
>>> df["Salary"].median()
66400.0
>>> df["Salary"] >= df["Salary"].median()
0 True
1 False
2 False
Name: Salary, dtype: bool
>>> (df["Salary"] >= df["Salary"].median()).astype(int)
0 1
1 0
2 0
Name: Salary, dtype: int32
To make the ternary approaches work (X if (condition) else Y), you'd need to apply
it because they don't play nicely with arrays, which don't have an unambiguous truth value. 要使三元方法(X,如果(条件),否则是Y)起作用,则需要
apply
它,因为它们不能很好地与没有明确的真值的数组一起使用。
I think you want something like this (using your notation and variable names). 我认为您想要这样的东西(使用您的符号和变量名)。
df['Salary'] = 0 if df['Salary'] < df['Salary'].median() else 1
This works exactly like it reads. 它的工作方式与读取的内容完全相同。 It says
df['Salary']
will be zero if the salary is less than the median, otherwise make it one. 它说,如果薪水少于中位数,则
df['Salary']
将为零,否则将其设为1。 For reference, this type of statement is known as a ternary operator . 作为参考,这种类型的语句称为三元运算符 。
This is just using a basic conditional and storing the variable. 这只是使用基本条件并存储变量。
median = 30500
salary = 50000
median_flag = 1 if salary > median else 0
print median_flag
1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.