简体   繁体   English

Pandas不会将分类数据[性别]更改为数值[0/1]

[英]Pandas does not change categorical data [sex] to numerical values [0/1]

I am trying to work through the Titanic dataset. 我正在尝试通过泰坦尼克号数据集。 I want to convert the Sex column to binary values. 我想将Sex列转换为二进制值。 This is my attempt: 这是我的尝试:

sex = train_dataset['Sex'].replace([0,1],['female','male'],inplace=True)

And when I try to print(sex) , the console outputs None ! 当我尝试print(sex) ,控制台输出None

I have tried to implement other approaches on SO as well but none of them seem to work. 我也试图在SO上实现其他方法,但它们似乎都没有用。 Below is my full source code: 以下是我的完整源代码:

import pandas as pd
from numpy import corrcoef

train_dataset = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/train.csv")
test_dataset = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/test.csv")

survived = train_dataset['Survived']
pClass = train_dataset['Pclass']

#Doesn't work
sex = train_dataset['Sex'].replace([0,1],['female','male'],inplace=True)

age = train_dataset['Age']
fare = train_dataset['Fare']
parch = train_dataset['Parch']
sibSp = train_dataset['SibSp']

# print("Correlation between parent-children & survival is: " + str(corrcoef(survived, parch)))
# print("Correlation between age & survival is: " + str(corrcoef(survived, age)))
# print("Correlation between Siblings/Spouse & survival is: " + str(corrcoef(survived, sibSp)))

print(sex)

Try: 尝试:

sex = train_dataset['Sex'].replace(['female','male'],[0,1])
print(sex)

It looks like your syntax is off. 看起来您的语法已关闭。 See the replace function 请参阅替换功能

Output: 输出: 在此输入图像描述

你可以使用np.where

dataset['sex']=np.where(dataset['sex']=='female',0,1)

Official documentation for the parameters: 参数的官方文档:

inplace : bool, default False If True, in place. inplace:bool,默认False如果为True,就位。 Note: this will modify any other views on this object (eg a column from a DataFrame). 注意:这将修改此对象上的任何其他视图(例如,DataFrame中的列)。 Returns the caller if this is True. 如果为True,则返回调用者。

To summarize, inplace=True returns None and inplace=False returns a copy of the object with the operation performed. 总而言之, inplace=True返回Noneinplace=False返回执行操作的对象副本。

So, in your case as the operation is inplace=True , the original series object train_dataset['Sex'] is modified. 因此,在您的情况下,当操作为inplace=True ,原始系列对象train_dataset['Sex']被修改。 Try to print the train_dataset after the operation, you should see the modified dataframe. 尝试在操作后打印train_dataset ,您应该看到修改后的数据帧。

Refer to the official documentation here 请参阅此处的官方文档

There are two problems here, first, you have turned around the arguments in .replace(<replace_this>, <with_this>) . 这里有两个问题,首先,你已经转换了.replace(<replace_this>, <with_this>) Secondly, you are using the option inplace=True . 其次,您使用的是inplace=True选项。 This changes the train_dataset instance, instead of returning a value. 这会更改train_dataset实例,而不是返回值。

Now that you know that no value is returned when using inplace=True , you will understand that sex should be equal to None , because nothing is returned: 现在您知道在使用inplace=True时没有返回任何值,您将理解sex应该等于None ,因为没有返回任何内容:

>>> import pandas as pd
>>> df = pd.DataFrame({'a': ['male', 'female', 'female', 'male']})
>>> df
        a
# 0    male
# 1  female
# 2  female
# 3    male

Without replace=True : 没有replace=True

Now when we replace the values, we'd get 现在,当我们更换值时,我们就会得到

>>> df.replace(['female', 'male'], [0,1])
# a
# 0  1
# 1  0
# 2  0
# 3  1

But df itself still looks exactly the same as it did before: 但是df本身看起来仍然和之前完全一样:

>>> df
        a
#  0    male
#  1  female
#  2  female
#  3    male

So in order to replace the value in df , we would do: 所以为了替换df的值,我们会这样做:

>>> df['a'] = df['a'].replace(['male', 'female'], [0,1])
>>> df
#    a
# 0  0
# 1  1
# 2  1
# 3  0

With replace=True : 使用replace=True

When you run this instead: df.replace(['female', 'male'], [0, 1], inplace=True)) , you would get a manipulated version of df back right away: 当你运行它时: df.replace(['female', 'male'], [0, 1], inplace=True)) ,你会立即得到df的操纵版本:

>>> df.replace(['female', 'male'], [0, 1], inplace=True)
>>> df
#    a
# 0  0
# 1  1
# 2  1
# 3  0

Note that inplace=True argument makes that no value is returned: 请注意, inplace=True参数使得不返回任何值:

>>> test = df.replace(['female', 'male'], [0, 1], inplace=True)
>>> type(test)
# <class 'NoneType'>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Pandas 中将数值转换为分类 - Converting Numerical Values to Categorical in Pandas pandas 替换命令无法将分类数据更改为数值数据 - pandas replace command unable to change categorical data to numerical data 在熊猫中将字符串/数字数据转换为分类格式 - Converting string/numerical data to categorical format in pandas 使用熊猫替换模块用数值替换数据中的分类值时出错 - Error during replacement of categorical values in data with numerical values using Pandas replace module Pyspark 分类数据矢量化及其相关数值 - Pyspark Categorical data vectorization with numerical values associated with it Pandas 将引用数值列的分类列更改为几列 - Pandas change a categorical column that references a numerical column into several columns 如何在 python pandas 的 for 循环中将分类数据转换为数值数据 - how to convert categorical data to numerical data in for loop in python pandas 如何在不增加数据大小的情况下将大熊猫中的分类变量转换为数值? - How to convert categorical variable to numerical in pandas without increasing size of data? 使用分类数据和数值数据绘制 pandas dataframe 的散点图 plot - Plotting scatter plot of pandas dataframe with both categorical and numerical data 将分类数据编码为数值 - Encoding categorical data to numerical
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM