How to prevent LabelEncoder from sorting label values?

Question

Scikit LabelEncoder is showing some puzzling behavior in my Jupyter Notebook, as in:

from sklearn.preprocessing import LabelEncoder
le2 = LabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))

prints ['one' 'one' 'one' 'zero' 'zero' 'zero'] . This is odd, shouldn't it print ['zero' 'zero' 'zero' 'one' 'one' 'one'] ? Then I tried

le3 = LabelEncoder()
le3.fit(['one', 'zero'])
print (le3.inverse_transform([0, 0, 0, 1, 1, 1]))

which also prints ['one' 'one' 'one' 'zero' 'zero' 'zero'] . Perhaps there was an alphabetization thing happening? Next, I tried

le4 = LabelEncoder()
le4.fit(['nil', 'one'])
print (le4.inverse_transform([0, 0, 0, 1, 1, 1]))

which prints ['nil' 'nil' 'nil' 'one' 'one' 'one'] !

I've spent several hours on this. FWIW, the example in the documentation works as expected so I suspect there is a flaw in how I expect inverse_transform to work. Part of my research included this and this .

In case it is relevant, I'm using iPython 7.7.0, numpy 1.17.3 and scikit-learn version 0.21.3.

Answer 1

Thing is that LabelEncoder.fit() returns sorted data always. That is because it uses np.unique Here's the source code

I guess the only way to do what you want is to create your own fit method and override the original one from LabelEncoder.

You just need to reuse the existing code as given in the link, here's example:

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import column_or_1d

class MyLabelEncoder(LabelEncoder):

    def fit(self, y):
        y = column_or_1d(y, warn=True)
        self.classes_ = pd.Series(y).unique()
        return self

le2 = MyLabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))

gives you:

['zero' 'zero' 'zero' 'one' 'one' 'one']

How to prevent LabelEncoder from sorting label values?

Question

1 answers

solution1
1 ACCPTED 2019-11-16 18:39:43

How to prevent LabelEncoder from sorting label values?

Question

1 answers

solution1 1 ACCPTED 2019-11-16 18:39:43

solution1
1 ACCPTED 2019-11-16 18:39:43