简体   繁体   English

用 pandas 中的数字替换列中的多个字符串值

[英]Replacing multiple string values in a column with numbers in pandas

I am currently working on a data frame in pandas named df .我目前正在处理 pandas 中名为df的数据框。 One column contains multiple labels (more than 100, to be exact).一列包含多个标签(准确地说,超过 100 个)。

I know how to replace values when there are a smaller amount of values.我知道当值较少时如何替换值。

For instance, in the typical Titanic example:例如,在典型的泰坦尼克号示例中:

titanic.Sex.replace({'male': 0,'female': 1}, inplace=True)

Of course, doing so for 100+ values would be extremely time-consuming.当然,为 100 多个值这样做会非常耗时。 I have seen similar questions, but all answers involve typing the data.我见过类似的问题,但所有答案都涉及输入数据。 Is there a faster way to do this?有没有更快的方法来做到这一点?

I think you're looking for factorize :我认为您正在寻找factorize

df = pd.DataFrame({'col': list('ABCDEBJZACA')})
df['factor'] = df['col'].factorize()[0]

output: output:

   col  factor
0    A       0
1    B       1
2    D       2
3    C       3
4    E       4
5    B       1
6    J       5
7    Z       6
8    A       0
9    C       3
10   A       0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM