用i替换已排序的Pandas数据框列中的每个唯一值

Question

I have a pandas dataframe with a list of user IDs that are about 40 characters long. 我有一个熊猫数据框，其中列出了大约40个字符长的用户ID。 I want to replace those user IDs with a number i starting from 0 for each id in order to save space. 我想将这些用户ID替换为每个ID从0开始的数字i，以节省空间。

What I have: 我有的：

userID      itemID
------------------
3a            r5
3a            r6
4b            r5
4c            r6

What I need: 我需要的：

 userID      itemID
 ------------------
 0            r5
 0            r6
 1            r5
 2            r6

Answer 1

use pd.factorize() : 使用pd.factorize（）：

In [145]: df
Out[145]:
  userID itemID
0     3a     r5
1     3a     r6
2     4b     r5
3     4c     r6

In [146]: df.userID = pd.factorize(df.userID)[0]

In [147]: df
Out[147]:
   userID itemID
0       0     r5
1       0     r6
2       1     r5
3       2     r6

if your main goal is to save memory - you can categorize your column: 如果您的主要目标是节省内存，则可以对列进行分类：

In [155]: df = pd.concat([df] * 5, ignore_index=True)

In [156]: df
Out[156]:
   userID itemID
0      3a     r5
1      3a     r6
2      4b     r5
3      4c     r6
4      3a     r5
5      3a     r6
6      4b     r5
7      4c     r6
8      3a     r5
9      3a     r6
10     4b     r5
11     4c     r6
12     3a     r5
13     3a     r6
14     4b     r5
15     4c     r6
16     3a     r5
17     3a     r6
18     4b     r5
19     4c     r6

In [157]: df.memory_usage()
Out[157]:
Index      80
userID    160
itemID    160
dtype: int64

categorizing userID : 对userID分类：

In [158]: df.userID = df.userID.astype('category')

In [159]: df.memory_usage()
Out[159]:
Index      80
userID     44    # <------------ NOTE:
itemID    160
dtype: int64

In [160]: df
Out[160]:
   userID itemID
0      3a     r5
1      3a     r6
2      4b     r5
3      4c     r6
4      3a     r5
5      3a     r6
6      4b     r5
7      4c     r6
8      3a     r5
9      3a     r6
10     4b     r5
11     4c     r6
12     3a     r5
13     3a     r6
14     4b     r5
15     4c     r6
16     3a     r5
17     3a     r6
18     4b     r5
19     4c     r6

In [161]: df.dtypes
Out[161]:
userID    category
itemID      object
dtype: object

用i替换已排序的Pandas数据框列中的每个唯一值

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-03-02 23:47:06

用i替换已排序的Pandas数据框列中的每个唯一值

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-03-02 23:47:06

解决方案1
3 已采纳 2017-03-02 23:47:06