如何将所有字符串（如“故障”）转换为唯一的浮点数？

Question

I have a DataFrame that has int , float , and object (strings with characters) items in it. 我有一个DataFrame，其中包含int ， float和object （带字符的字符串）项。 I want a unique float for every unique object like so- 我希望每个独特的物体都有一个独特的浮子，如此

Exhuast
Fault
Probation
Exhaust
Fault
Motor

to 至

1.
2.
3.
1.
2.
4.

Also, will it work on all of the columns or would I have to do column by column? 此外，它是否适用于所有列，还是我必须逐列？

Last question, will it also convert all the int 's to float 's as well? 最后一个问题，它还将所有的int转换为float吗？

Answer 1

As mentioned by Jon, you could make use of Series.factorize . 正如Jon所说，你可以使用Series.factorize 。

(s.factorize()[0]+1).astype('float')

To perform this column-wise over an entire DataFrame, just use apply . 要在整个DataFrame上按列执行此操作，只需使用apply 。

Demo 演示

>>> s = pd.Series(['Exhaust', 'Fault', 'Probation', 5, int,
                   'Exhaust', int, 'Fault', 'Motor'])

>>> s
0          Exhaust
1            Fault
2        Probation
3                5
4    <class 'int'>
5          Exhaust
6    <class 'int'>
7            Fault
8            Motor
dtype: object

>>> (s.factorize()[0]+1).astype('float')
array([ 1.,  2.,  3.,  4.,  5.,  1.,  5.,  2.,  6.])

A NumPy solution may be to use the return_inverse keyword arg of np.unique , 甲NumPy的解决方案可以是使用return_inverse的关键字ARG np.unique ，

(np.unique(s, return_inverse=True)[1]+1).astype('float')

however from some rough benchmarking the Pandas solution may be faster. 然而，从一些粗略的基准测试来看，Pandas解决方案可能会更快。

如何将所有字符串（如“故障”）转换为唯一的浮点数？

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-02-25 16:56:39

如何将所有字符串（如“故障”）转换为唯一的浮点数？

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-02-25 16:56:39

解决方案1
4 已采纳 2017-02-25 16:56:39