[英]Convert classes to numeric in a pandas dataframe
I' doing a project based on this Kaggle dataset: https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings/data and I need to put the data into a kNN model, however this can't be done in its current state as I need to transform the string values into integers.我基于这个 Kaggle 数据集做了一个项目: https ://www.kaggle.com/rush4ratio/video-game-sales-with-ratings/data,我需要将数据放入 kNN 模型中,但是这可以' t 在当前状态下完成,因为我需要将字符串值转换为整数。
get_dummies isn't ideal as there are loads of categorical data in the dataset and will create thousands of columns. get_dummies 并不理想,因为数据集中有大量分类数据,并且会创建数千列。 I am looking for a way to transform strings to numeric representations, for example:
我正在寻找一种将字符串转换为数字表示的方法,例如:
Platform || Critic_Score || Publisher || Global_Sales
Wii || 73 || Nintendo || 53
Wii || 86 || Nintendo || 60
PC || 80 ||Activision || 30
PS3 || 74 ||Activision || 35
Xbox360 || 81 || 2K || 38
I'd like to transform into this:我想变成这样:
Platform || Critic_Score || Publisher || Global_Sales
1 || 73 || 1 || 53
1 || 86 || 1 || 60
2 || 80 || 2 || 30
3 || 74 || 2 || 35
4 || 81 || 3 || 38
I'm using Python 3.我正在使用 Python 3。
Thanks.谢谢。
I think you need factorize
:我认为你需要
factorize
:
df['Platform'] = pd.factorize(df['Platform'])[0] + 1
df['Publisher'] = pd.factorize(df['Publisher'])[0] + 1
print (df)
Platform Critic_Score Publisher Global_Sales
0 1 73 1 53
1 1 86 1 60
2 2 80 2 30
3 3 74 2 35
4 4 81 3 38
cols = ['Platform', 'Publisher']
df[cols] = df[cols].apply(lambda x: pd.factorize(x)[0] + 1)
print (df)
Platform Critic_Score Publisher Global_Sales
0 1 73 1 53
1 1 86 1 60
2 2 80 2 30
3 3 74 2 35
4 4 81 3 38
You can use Label Encoder [ https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html]您可以使用标签编码器 [ https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html]
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df['Platform'])
df['Platform']=le.transform(df['Platform'])
le.fit(df['Publisher'])
df['Publisher']=le.transform(df['Publisher'])
print(df)
This will give transform strings to numeric representations这将使转换字符串转换为数字表示
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.