简体   繁体   English

在熊猫数据框中将类转换为数字

[英]Convert classes to numeric in a pandas dataframe

I' doing a project based on this Kaggle dataset: https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings/data and I need to put the data into a kNN model, however this can't be done in its current state as I need to transform the string values into integers.我基于这个 Kaggle 数据集做了一个项目: https ://www.kaggle.com/rush4ratio/video-game-sales-with-ratings/data,我需要将数据放入 kNN 模型中,但是这可以' t 在当前状态下完成,因为我需要将字符串值转换为整数。

get_dummies isn't ideal as there are loads of categorical data in the dataset and will create thousands of columns. get_dummies 并不理想,因为数据集中有大量分类数据,并且会创建数千列。 I am looking for a way to transform strings to numeric representations, for example:我正在寻找一种将字符串转换为数字表示的方法,例如:

Platform || Critic_Score || Publisher || Global_Sales
Wii      ||      73      ||  Nintendo ||  53
Wii      ||      86      ||  Nintendo ||  60
PC       ||      80      ||Activision ||  30
PS3      ||      74      ||Activision ||  35
Xbox360  ||      81      ||   2K      ||  38

I'd like to transform into this:我想变成这样:

Platform || Critic_Score || Publisher || Global_Sales
  1      ||      73      ||     1     ||  53
  1      ||      86      ||     1     ||  60
  2      ||      80      ||     2     ||  30
  3      ||      74      ||     2     ||  35
  4      ||      81      ||     3     ||  38

I'm using Python 3.我正在使用 Python 3。

Thanks.谢谢。

I think you need factorize :我认为你需要factorize

df['Platform'] = pd.factorize(df['Platform'])[0] + 1
df['Publisher'] = pd.factorize(df['Publisher'])[0] + 1
print (df)
   Platform  Critic_Score  Publisher  Global_Sales
0         1            73          1            53
1         1            86          1            60
2         2            80          2            30
3         3            74          2            35
4         4            81          3            38

cols = ['Platform', 'Publisher']
df[cols] = df[cols].apply(lambda x: pd.factorize(x)[0] + 1)

print (df)
   Platform  Critic_Score  Publisher  Global_Sales
0         1            73          1            53
1         1            86          1            60
2         2            80          2            30
3         3            74          2            35
4         4            81          3            38

You can use Label Encoder [ https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html]您可以使用标签编码器 [ https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html]

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df['Platform'])
df['Platform']=le.transform(df['Platform'])
le.fit(df['Publisher'])
df['Publisher']=le.transform(df['Publisher'])
print(df)

This will give transform strings to numeric representations这将使转换字符串转换为数字表示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM