[英]Pandas does not change categorical data [sex] to numerical values [0/1]
I am trying to work through the Titanic dataset. 我正在尝试通过泰坦尼克号数据集。 I want to convert the
Sex
column to binary values. 我想将
Sex
列转换为二进制值。 This is my attempt: 这是我的尝试:
sex = train_dataset['Sex'].replace([0,1],['female','male'],inplace=True)
And when I try to print(sex)
, the console outputs None
! 当我尝试
print(sex)
,控制台输出None
!
I have tried to implement other approaches on SO as well but none of them seem to work. 我也试图在SO上实现其他方法,但它们似乎都没有用。 Below is my full source code:
以下是我的完整源代码:
import pandas as pd
from numpy import corrcoef
train_dataset = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/train.csv")
test_dataset = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/test.csv")
survived = train_dataset['Survived']
pClass = train_dataset['Pclass']
#Doesn't work
sex = train_dataset['Sex'].replace([0,1],['female','male'],inplace=True)
age = train_dataset['Age']
fare = train_dataset['Fare']
parch = train_dataset['Parch']
sibSp = train_dataset['SibSp']
# print("Correlation between parent-children & survival is: " + str(corrcoef(survived, parch)))
# print("Correlation between age & survival is: " + str(corrcoef(survived, age)))
# print("Correlation between Siblings/Spouse & survival is: " + str(corrcoef(survived, sibSp)))
print(sex)
Try: 尝试:
sex = train_dataset['Sex'].replace(['female','male'],[0,1])
print(sex)
It looks like your syntax is off. 看起来您的语法已关闭。 See the replace function
请参阅替换功能
你可以使用np.where
dataset['sex']=np.where(dataset['sex']=='female',0,1)
Official documentation for the parameters: 参数的官方文档:
inplace : bool, default False If True, in place.
inplace:bool,默认False如果为True,就位。 Note: this will modify any other views on this object (eg a column from a DataFrame).
注意:这将修改此对象上的任何其他视图(例如,DataFrame中的列)。 Returns the caller if this is True.
如果为True,则返回调用者。
To summarize, inplace=True
returns None
and inplace=False
returns a copy of the object with the operation performed. 总而言之,
inplace=True
返回None
, inplace=False
返回执行操作的对象副本。
So, in your case as the operation is inplace=True
, the original series object train_dataset['Sex']
is modified. 因此,在您的情况下,当操作为
inplace=True
,原始系列对象train_dataset['Sex']
被修改。 Try to print the train_dataset
after the operation, you should see the modified dataframe. 尝试在操作后打印
train_dataset
,您应该看到修改后的数据帧。
There are two problems here, first, you have turned around the arguments in .replace(<replace_this>, <with_this>)
. 这里有两个问题,首先,你已经转换了
.replace(<replace_this>, <with_this>)
。 Secondly, you are using the option inplace=True
. 其次,您使用的是
inplace=True
选项。 This changes the train_dataset
instance, instead of returning a value. 这会更改
train_dataset
实例,而不是返回值。
Now that you know that no value is returned when using inplace=True
, you will understand that sex
should be equal to None
, because nothing is returned: 现在您知道在使用
inplace=True
时没有返回任何值,您将理解sex
应该等于None
,因为没有返回任何内容:
>>> import pandas as pd
>>> df = pd.DataFrame({'a': ['male', 'female', 'female', 'male']})
>>> df
a
# 0 male
# 1 female
# 2 female
# 3 male
replace=True
: replace=True
: Now when we replace the values, we'd get 现在,当我们更换值时,我们就会得到
>>> df.replace(['female', 'male'], [0,1])
# a
# 0 1
# 1 0
# 2 0
# 3 1
But df
itself still looks exactly the same as it did before: 但是
df
本身看起来仍然和之前完全一样:
>>> df
a
# 0 male
# 1 female
# 2 female
# 3 male
So in order to replace the value in df
, we would do: 所以为了替换
df
的值,我们会这样做:
>>> df['a'] = df['a'].replace(['male', 'female'], [0,1])
>>> df
# a
# 0 0
# 1 1
# 2 1
# 3 0
replace=True
: replace=True
: When you run this instead: df.replace(['female', 'male'], [0, 1], inplace=True))
, you would get a manipulated version of df
back right away: 当你运行它时:
df.replace(['female', 'male'], [0, 1], inplace=True))
,你会立即得到df
的操纵版本:
>>> df.replace(['female', 'male'], [0, 1], inplace=True)
>>> df
# a
# 0 0
# 1 1
# 2 1
# 3 0
Note that inplace=True
argument makes that no value is returned: 请注意,
inplace=True
参数使得不返回任何值:
>>> test = df.replace(['female', 'male'], [0, 1], inplace=True)
>>> type(test)
# <class 'NoneType'>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.