Scikit LabelEncoder is showing some puzzling behavior in my Jupyter Notebook, as in:
from sklearn.preprocessing import LabelEncoder
le2 = LabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))
prints ['one' 'one' 'one' 'zero' 'zero' 'zero']
. This is odd, shouldn't it print ['zero' 'zero' 'zero' 'one' 'one' 'one']
? Then I tried
le3 = LabelEncoder()
le3.fit(['one', 'zero'])
print (le3.inverse_transform([0, 0, 0, 1, 1, 1]))
which also prints ['one' 'one' 'one' 'zero' 'zero' 'zero']
. Perhaps there was an alphabetization thing happening? Next, I tried
le4 = LabelEncoder()
le4.fit(['nil', 'one'])
print (le4.inverse_transform([0, 0, 0, 1, 1, 1]))
which prints ['nil' 'nil' 'nil' 'one' 'one' 'one']
!
I've spent several hours on this. FWIW, the example in the documentation works as expected so I suspect there is a flaw in how I expect inverse_transform
to work. Part of my research included this and this .
In case it is relevant, I'm using iPython 7.7.0, numpy 1.17.3 and scikit-learn version 0.21.3.
Thing is that LabelEncoder.fit() returns sorted data always. That is because it uses np.unique
Here's the source code
I guess the only way to do what you want is to create your own fit
method and override the original one from LabelEncoder.
You just need to reuse the existing code as given in the link, here's example:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import column_or_1d
class MyLabelEncoder(LabelEncoder):
def fit(self, y):
y = column_or_1d(y, warn=True)
self.classes_ = pd.Series(y).unique()
return self
le2 = MyLabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))
gives you:
['zero' 'zero' 'zero' 'one' 'one' 'one']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.