简体   繁体   English

我如何在Python中创建一个虚拟变量,其条件低于或高于中值?

[英]How can I create a dummy variable in Python with a condition below or above median?

How can I create binary dummy variables in Python which takes the value of 0 when a person's salary is below the median salary level and is set to 1 otherwise? 我如何在Python中创建二进制虚拟变量,当一个人的工资低于工资中位数,而该值设置为1时,该变量的值为0 I don't understand how to make it when salary above or below. 当工资高于或低于工资时,我不知道如何做到。

I tried this 我试过了

df['Salary'] = (df['Salary'] > df['Salary'].median()) & (df['Salary'] < df['Salary'].median())

But there is no output. 但是没有输出。

Before that I tried this: 在此之前,我尝试过此操作:

df['Salary'].median()
df_Salary = pd.get_dummies(df['Salary'].median())
df_new = pd.concat([df, df_Salary], axis=1)
df_new

And got this 并得到这个

    Gender  Exp Salary  74000.0

0   Female  15  78200   1
1   Female  12  66400   NaN
2   Female  3   6000    NaN
...

您可以通过将布尔值乘以一个整数来将其强制为整数:

df["Median_Compare"] = (df["Salary"] >= df["Salary"].median()) * 1

You can do a vectorized comparison and convert the result to an int: 您可以进行向量化比较,然后将结果转换为int:

>>> df["Median_Compare"] = (df["Salary"] >= df["Salary"].median()).astype(int)
>>> df
   Gender  Exp  Salary  Median_Compare
0  Female   15   78200               1
1  Female   12   66400               0
2  Female    3    6000               0

This works because we have 这行得通,因为我们有

>>> df["Salary"].median()
66400.0
>>> df["Salary"] >= df["Salary"].median()
0     True
1    False
2    False
Name: Salary, dtype: bool
>>> (df["Salary"] >= df["Salary"].median()).astype(int)
0    1
1    0
2    0
Name: Salary, dtype: int32

To make the ternary approaches work (X if (condition) else Y), you'd need to apply it because they don't play nicely with arrays, which don't have an unambiguous truth value. 要使三元方法(X,如果(条件),否则是Y)起作用,则需要apply它,因为它们不能很好地与没有明确的真值的数组一起使用。

I think you want something like this (using your notation and variable names). 我认为您想要这样的东西(使用您的符号和变量名)。

df['Salary'] = 0 if df['Salary'] < df['Salary'].median() else 1

This works exactly like it reads. 它的工作方式与读取的内容完全相同。 It says df['Salary'] will be zero if the salary is less than the median, otherwise make it one. 它说,如果薪水少于中位数,则df['Salary']将为零,否则将其设为1。 For reference, this type of statement is known as a ternary operator . 作为参考,这种类型的语句称为三元运算符

This is just using a basic conditional and storing the variable. 这只是使用基本条件并存储变量。

median = 30500
salary = 50000
median_flag = 1 if salary > median else 0
print median_flag
1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Python在Jupyter Notebook中的matplotlib图旁边(而不是下方或上方)创建工作按钮? - How can I create working buttons next to (and not below or above) a matplotlib plot in Jupyter Notebook using Python? 如何应用函数来创建虚拟变量? - How can I apply function to create dummy variable? 我如何创建一个函数来查找高于中位数的平均值? - How do I create a function to find average above median? 如何在python中为事件研究创建虚拟变量 - How to create a dummy variable for event study in python 如何在 Keras 中创建虚拟模型? - How can I create a dummy model in Keras? 我如何在python中实现中值堆叠? - How can i realise median stacking in python? 如何从两个日期变量创建一个年份虚拟变量作为 Python 中的一个范围? - How do I create a year dummy variable from two date variables as a range in Python? 如何导入位于上面两个目录中和下面一个目录中的Python类? - How can I import a Python class that is in two directories above and one below? 如何替换数组中高于 python 上限或下限的数字? - How can I replace numbers in an array that fall above an upper bound or below a lower bound in python? 如果包含缺失值,如何在 Python 中创建虚拟变量? - How to create a Dummy Variable in Python if Missing Values are included?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM