[英]Is it possible to apply sklearn.preprocessing.LabelEncoder() on a 2D list?
Say I have a list as given below:假设我有一个如下列表:
l = [
['PER', 'O', 'O', 'GEO'],
['ORG', 'O', 'O', 'O'],
['O', 'O', 'O', 'GEO'],
['O', 'O', 'PER', 'O']
]
I want to encode the 2D list with LabelEncoder().我想用 LabelEncoder() 对二维列表进行编码。
It should look something like:它应该看起来像:
l = [
[1, 0, 0, 2],
[3, 0, 0, 0],
[0, 0, 0, 2],
[0, 0, 1, 0]
]
Is it possible?可能吗? If not, is there any workaround?如果没有,是否有任何解决方法?
Thanks in advance!提前致谢!
You can flatten the list, fit the encoder with all the potential values and then use the encoder to transform each sublist, as shown below:您可以展平列表,用所有潜在值拟合编码器,然后使用编码器转换每个子列表,如下所示:
from sklearn.preprocessing import LabelEncoder
l = [
['PER', 'O', 'O', 'GEO'],
['ORG', 'O', 'O', 'O'],
['O', 'O', 'O', 'GEO'],
['O', 'O', 'PER', 'O']
]
flattened_l = [e for sublist in l for e in sublist]
# flattened_l is ['PER', 'O', 'O', 'GEO', 'ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'GEO', 'O', 'O', 'PER', 'O']
le = LabelEncoder().fit(flattened_l)
# See the mapping generated by the encoder:
list(enumerate(le.classes_))
# [(0, 'GEO'), (1, 'O'), (2, 'ORG'), (3, 'PER')]
# And, finally, transform each sublist:
res = [list(le.transform(sublist)) for sublist in l]
res
# Getting the result you want:
# [[3, 1, 1, 0], [2, 1, 1, 1], [1, 1, 1, 0], [1, 1, 3, 1]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.