是否可以将 sklearn.preprocessing.LabelEncoder() 应用于 2D 列表？

Question

Say I have a list as given below:假设我有一个如下列表：

l = [
       ['PER', 'O', 'O', 'GEO'],
       ['ORG', 'O', 'O', 'O'],
       ['O', 'O', 'O', 'GEO'],
       ['O', 'O', 'PER', 'O']
    ]

I want to encode the 2D list with LabelEncoder().我想用 LabelEncoder() 对二维列表进行编码。

It should look something like:它应该看起来像：

l = [
       [1, 0, 0, 2],
       [3, 0, 0, 0],
       [0, 0, 0, 2],
       [0, 0, 1, 0]
    ]

Is it possible?可能吗？ If not, is there any workaround?如果没有，是否有任何解决方法？

Thanks in advance!提前致谢！

Answer 1

You can flatten the list, fit the encoder with all the potential values and then use the encoder to transform each sublist, as shown below:您可以展平列表，用所有潜在值拟合编码器，然后使用编码器转换每个子列表，如下所示：

from sklearn.preprocessing import LabelEncoder

l = [
       ['PER', 'O', 'O', 'GEO'],
       ['ORG', 'O', 'O', 'O'],
       ['O', 'O', 'O', 'GEO'],
       ['O', 'O', 'PER', 'O']
    ]

flattened_l = [e for sublist in l for e in sublist]

# flattened_l is ['PER', 'O', 'O', 'GEO', 'ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'GEO', 'O', 'O', 'PER', 'O']

le = LabelEncoder().fit(flattened_l)

# See the mapping generated by the encoder:
list(enumerate(le.classes_))
# [(0, 'GEO'), (1, 'O'), (2, 'ORG'), (3, 'PER')]

# And, finally, transform each sublist:
res = [list(le.transform(sublist)) for sublist in l]
res

# Getting the result you want:
# [[3, 1, 1, 0], [2, 1, 1, 1], [1, 1, 1, 0], [1, 1, 3, 1]]

是否可以将 sklearn.preprocessing.LabelEncoder() 应用于 2D 列表？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-20 18:54:17

是否可以将 sklearn.preprocessing.LabelEncoder() 应用于 2D 列表？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-20 18:54:17

解决方案1
1 已采纳 2021-04-20 18:54:17