简体   繁体   English

根据另一列的值在熊猫中创建新列

[英]Create new column in pandas based on value of another column

I have some dataset about genders of various individuals.我有一些关于不同个体性别的数据集。 Say, the dataset looks like this:假设数据集如下所示:

Male
Female
Male and Female
Male
Male
Female
Trans
Unknown
Male and Female

Some identify themselves as Male, some female and some identify themselves as both male and female.有的自称为男性,有的自称为女性,有的自称为男性和女性。

Now, what I want to do is create a new column in Pandas which maps现在,我想要做的是在 Pandas 中创建一个新列,它映射

Males to 1, 
Females to 2,
Others to 3

I wrote some code我写了一些代码

def gender(x):
    if x.str.contains("Male")
        return 1
    elif x.str.contains("Female")
        return 2
    elif return 3

df["Gender Values"] = df["Gender"].apply(gender)

But I was getting errors that function doesn't contain any attribute contains.但是我收到了函数不包含任何属性包含的错误。 I tried removing str:我尝试删除 str:

x.contains("Male")

and I was getting same error我遇到了同样的错误

Is there a better way to do this?有一个更好的方法吗?

You can use:您可以使用:

def gender(x):
    if "Female" in x and "Male" in x:
        return 3
    elif "Male" in x:
        return 1
    elif "Female" in x:
        return 2
    else: return 4

df["Gender Values"] = df["Gender"].apply(gender)

print (df)
            Gender  Gender Values
0             Male              1
1           Female              2
2  Male and Female              3
3             Male              1
4             Male              1
5           Female              2
6            Trans              4
7          Unknown              4
8  Male and Female              3

Create a mapping function, and use that to map the values.创建一个映射函数,并使用它来映射值。

def map_identity(identity):
    if gender.lower() == 'male':
        return 1
    elif gender.lower() == 'female':
        return 2
    else: 
        return 3

df["B"] = df["A"].map(map_identity)

If there is no specific requirement to use 1, 2, 3 to Males, Females and Others respectively in that order, you can try LabelEncoder from Scikit-Learn.如果没有具体要求按顺序分别使用 1、2、3 到男性、女性和其他,您可以尝试使用 Scikit-Learn 的 LabelEncoder。 It will randomly allocate a unique number to each unique category in that column.它将随机为该列中的每个唯一类别分配一个唯一编号。

from sklearn import preprocessing
encoder = preprocessing.LabelEncoder()
encoder.fit(df["gender"])

For details, you can check Label Encoder documentation.有关详细信息,您可以查看标签编码器文档。

Hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据 pandas 中另一列的值创建一个新列 - Create a new column, based on the value of another column in pandas Pandas - 根据另一列的条件值创建新列 - Pandas - create new column based on conditional value of another column Pandas:根据另一列的文本值创建新列 - Pandas : Create new column based on text value of another column Python Pandas 根据另一个列值创建新列 - Python Pandas create new column based on another column value 熊猫根据另一列中的值创建新列,如果为False,则返回新列的先前值 - Pandas Create New Column Based on Value in Another Column, If False Return Previous Value of New Column 根据另一列中的值创建新列 - Create a new column based on value in another column 根据一列的条件和熊猫中另一列的值创建新列 - Create new column based on condition from one column and the value from another column in pandas 基于groupby一个列值和pandas中另一列的计数创建一个新列? - Create a new column based on groupby a column value and count of another column in pandas? 如何根据一组条件在 PANDAS 中创建一个新列,然后将新列设置为另一个字段的值 - How can I create a new column in PANDAS based on a set of conditions and then setting the new column to the value of another field pandas 根据 boolean 值创建新列 - pandas create new column based on boolean value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM