简体   繁体   English

在熊猫数据框中将字符串转换为整数

[英]Convert strings to integers in pandas dataframe

I have a dataset like below:我有一个如下所示的数据集:

                Name       ARowss   TotalRowss        Percentage
                motors        11           11               100 
                trck1         2             2               100 
                trck2         2             2               100 
                hydr1         4             4               100
                gas1          2             2               100

I am doing some cleanup of data for which I have to assign a new number to each value in "Name".我正在对数据进行一些清理,我必须为“名称”中的每个值分配一个新数字。 All values are unique in "Name" column.所有值在“名称”列中都是唯一的。 So, from the above dataset, "motors" should have 1, "trck1" should have 2, "trck2" should have 3 and so on.所以,从上面的数据集中,“motors”应该有 1,“trck1”应该有 2,“trck2”应该有 3,依此类推。

Is this what you want?这是你想要的吗?

In [5]: df['id'] = pd.factorize(df.Name)[0]

In [6]: df
Out[6]:
     Name  ARowss  TotalRowss  Percentage  id
0  motors      11          11         100   0
1   trck1       2           2         100   1
2   trck2       2           2         100   2
3   hydr1       4           4         100   3
4    gas1       2           2         100   4

or this, depending on your goals:或者这个,取决于你的目标:

In [10]: df.Name = pd.factorize(df.Name)[0] + 1

In [11]: df
Out[11]:
   Name  ARowss  TotalRowss  Percentage
0     1      11          11         100
1     2       2           2         100
2     3       2           2         100
3     4       4           4         100
4     5       2           2         100

It will also work for non-unique values:它也适用于非唯一值:

In [15]: df
Out[15]:
     Name  ARowss  TotalRowss  Percentage
0  motors      11          11         100
1   trck1       2           2         100
2   trck2       2           2         100
3   hydr1       4           4         100
4    gas1       2           2         100  # duplicates in `Name`
5    gas1       2           3         111  # 

In [16]: df.Name = pd.factorize(df.Name)[0] + 1

In [17]: df
Out[17]:
   Name  ARowss  TotalRowss  Percentage
0     1      11          11         100
1     2       2           2         100
2     3       2           2         100
3     4       4           4         100
4     5       2           2         100  #
5     5       2           3         111  # 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM