简体   繁体   English

如何将bert的embeddins向量与其他特征结合起来?

[英]How to combine embeddins vectors of bert with other features?

I am working on a classification task with 3 labels (0,1,2 = neg, pos, neu).我正在处理带有 3 个标签(0、1、2 = neg、pos、neu)的分类任务。 Data are sentences.数据是句子。 So to produce vectors/embeddings of sentences, I use a Bert encoder to get embeddings for each sentence and then I used a simple knn to make predictions.因此,为了生成句子的向量/嵌入,我使用 Bert 编码器为每个句子获取嵌入,然后我使用一个简单的 knn 进行预测。

My data look like this: each sentence has a label and other numerical value of classification.我的数据是这样的:每个句子都有一个label等分类数值。

For example, my data look like this例如,我的数据是这样的

Sentence embeddings_BERT level sub-level label

je mange  [0.21, 0.56]    2     2.1      pos
il hait   [0.25, 0.39]   3     3.1      neg
.....

As you can see each sentence has other categories but the are not the final one but indices to help figure the label when a human annotated the data.正如您所看到的,每个句子都有其他类别,但不是最后一个类别,而是在人工注释数据时帮助计算 label 的索引。 I want my model to take into consideration those two values when predicting the label. I was wondering if I have to concatenate them with the embeddings generate by the bert encoding or is there another way?我希望我的 model 在预测 label 时考虑到这两个值。我想知道我是否必须将它们与 bert 编码生成的嵌入连接起来,还是有其他方法?

There is not one perfect way to tackle this problem, but a simple solution will be to concat the bert embeddings with hard-coded features.没有一种完美的方法可以解决这个问题,但一个简单的解决方案是将 bert 嵌入与硬编码特征连接起来。 The BERT embeddings (sentence embeddings) will be of dimension 768 (if you have used BERT base). BERT 嵌入(句子嵌入)的维度为 768(如果您使用了 BERT base)。 These embeddings can be treated as features of the sentence itself.这些嵌入可以被视为句子本身的特征。 The additional features can be concatenated to form a higher dimensional vector.额外的特征可以连接起来形成一个更高维的向量。 If the features are categorical, it will be ideal to convert to one-hot vectors and concatenate them.如果特征是分类的,那么转换为单热向量并将它们连接起来将是理想的。 For example, if you want to use level in your example as set of input features, it will be best to convert it into one-hot feature vector and then concatenate with BERT embeddings.例如,如果您想在示例中使用level作为一组输入特征,最好将其转换为单热特征向量,然后与 BERT 嵌入连接。 However, in some cases, your hard coded features can be dominant feature to bias the classifier, and in some other cases, it can have no influence at all.然而,在某些情况下,您的硬编码特征可能是偏向分类器的主要特征,而在其他一些情况下,它可能根本没有影响。 It all depends on the data that you have.这完全取决于您拥有的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM