在熊猫数据框中将类转换为数字

Question

I' doing a project based on this Kaggle dataset: https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings/data and I need to put the data into a kNN model, however this can't be done in its current state as I need to transform the string values into integers.我基于这个 Kaggle 数据集做了一个项目： https ://www.kaggle.com/rush4ratio/video-game-sales-with-ratings/data，我需要将数据放入 kNN 模型中，但是这可以' t 在当前状态下完成，因为我需要将字符串值转换为整数。

get_dummies isn't ideal as there are loads of categorical data in the dataset and will create thousands of columns. get_dummies 并不理想，因为数据集中有大量分类数据，并且会创建数千列。 I am looking for a way to transform strings to numeric representations, for example:我正在寻找一种将字符串转换为数字表示的方法，例如：

Platform || Critic_Score || Publisher || Global_Sales
Wii      ||      73      ||  Nintendo ||  53
Wii      ||      86      ||  Nintendo ||  60
PC       ||      80      ||Activision ||  30
PS3      ||      74      ||Activision ||  35
Xbox360  ||      81      ||   2K      ||  38

I'd like to transform into this:我想变成这样：

Platform || Critic_Score || Publisher || Global_Sales
  1      ||      73      ||     1     ||  53
  1      ||      86      ||     1     ||  60
  2      ||      80      ||     2     ||  30
  3      ||      74      ||     2     ||  35
  4      ||      81      ||     3     ||  38

I'm using Python 3.我正在使用 Python 3。

Thanks.谢谢。

Answer 1

I think you need factorize :我认为你需要factorize ：

df['Platform'] = pd.factorize(df['Platform'])[0] + 1
df['Publisher'] = pd.factorize(df['Publisher'])[0] + 1
print (df)
   Platform  Critic_Score  Publisher  Global_Sales
0         1            73          1            53
1         1            86          1            60
2         2            80          2            30
3         3            74          2            35
4         4            81          3            38

cols = ['Platform', 'Publisher']
df[cols] = df[cols].apply(lambda x: pd.factorize(x)[0] + 1)

print (df)
   Platform  Critic_Score  Publisher  Global_Sales
0         1            73          1            53
1         1            86          1            60
2         2            80          2            30
3         3            74          2            35
4         4            81          3            38

Answer 2

You can use Label Encoder [ https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html]您可以使用标签编码器 [ https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html]

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df['Platform'])
df['Platform']=le.transform(df['Platform'])
le.fit(df['Publisher'])
df['Publisher']=le.transform(df['Publisher'])
print(df)

This will give transform strings to numeric representations这将使转换字符串转换为数字表示

在熊猫数据框中将类转换为数字

问题描述

2 个解决方案

解决方案1
7 已采纳 2017-12-01 14:21:32

解决方案2
4 2020-03-15 19:05:47

在熊猫数据框中将类转换为数字

问题描述

2 个解决方案

解决方案1 7 已采纳 2017-12-01 14:21:32

解决方案2 4 2020-03-15 19:05:47

解决方案1
7 已采纳 2017-12-01 14:21:32

解决方案2
4 2020-03-15 19:05:47