[英]How do I convert all strings (like “Fault”) and into a unique float?
I have a DataFrame that has int
, float
, and object
(strings with characters) items in it. 我有一个DataFrame,其中包含
int
, float
和object
(带字符的字符串)项。 I want a unique float for every unique object like so- 我希望每个独特的物体都有一个独特的浮子,如此
Exhuast
Fault
Probation
Exhaust
Fault
Motor
to 至
1.
2.
3.
1.
2.
4.
Also, will it work on all of the columns or would I have to do column by column? 此外,它是否适用于所有列,还是我必须逐列?
Last question, will it also convert all the int
's to float
's as well? 最后一个问题,它还将所有的
int
转换为float
吗?
As mentioned by Jon, you could make use of Series.factorize
. 正如Jon所说,你可以使用
Series.factorize
。
(s.factorize()[0]+1).astype('float')
To perform this column-wise over an entire DataFrame, just use apply
. 要在整个DataFrame上按列执行此操作,只需使用
apply
。
Demo 演示
>>> s = pd.Series(['Exhaust', 'Fault', 'Probation', 5, int,
'Exhaust', int, 'Fault', 'Motor'])
>>> s
0 Exhaust
1 Fault
2 Probation
3 5
4 <class 'int'>
5 Exhaust
6 <class 'int'>
7 Fault
8 Motor
dtype: object
>>> (s.factorize()[0]+1).astype('float')
array([ 1., 2., 3., 4., 5., 1., 5., 2., 6.])
A NumPy solution may be to use the return_inverse
keyword arg of np.unique
, 甲NumPy的解决方案可以是使用
return_inverse
的关键字ARG np.unique
,
(np.unique(s, return_inverse=True)[1]+1).astype('float')
however from some rough benchmarking the Pandas solution may be faster. 然而,从一些粗略的基准测试来看,Pandas解决方案可能会更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.