如何在 2 列上训练 ML 模型以解决分类问题？

Question

I have three columns in a dataset on which I'm doing sentiment analysis(classes 0 , 1 , 2 ):我在数据集中有三列我正在做情绪分析（类0 、 1 、 2 ）：

text    thing    sentiment

But the problem is that I can train my data only on either text or thing and get predicted sentiment .但问题是我只能在text或thing上训练我的数据并获得预测的sentiment 。 Is there a way to train the data both on text & thing and then predict sentiment ?有没有办法在text和thing上训练数据然后预测sentiment ？

Problem case(say):问题案例（说）：

  |text  thing  sentiment
0 | t1   thing1    0
. |
. |
54| t1   thing2    2

This example tells us that sentiment shall depend on the thing as well.这个例子告诉我们，情绪也取决于thing 。 If I try to concatenate the two columns one below the other and then try but that would be incorrect as we wouldn't be giving any relationship between the two columns to the model.如果我尝试将两列一个接一个地连接起来，然后尝试，但这将是不正确的，因为我们不会将两列之间的任何关系提供给模型。

Also my test set contains two columns test and thing for which I've to predict the sentiment according to the trained model on the two columns.此外，我的测试集包含两列test和thing ，我必须根据两列上的训练模型预测情绪。

Right now I'm using the tokenizer and then the model below:现在我正在使用分tokenizer ，然后是下面的模型：

model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

Any pointers on how to proceed or which model or coding manipulation to use ?关于如何进行或使用哪种模型或编码操作的任何指示？

Answer 1

You may want to shift to the Keras functionnal API and train a multi-input model. 您可能需要切换到Keras功能API并训练多输入模型。

According to Keras's creator, François CHOLLET, in his book Deep Learning with Python [Manning, 2017] (chapter 7, section 1) : 根据Keras的创建者FrançoisCHOLLET在他的《 用Python进行深度学习》一书中的描述[Manning，2017] （第7章，第1节）：

Some tasks, require multimodal inputs: they merge data coming from different input sources, processing each type of data using different kinds of neural layers. 一些任务需要多模式输入：它们合并来自不同输入源的数据，使用不同种类的神经层处理每种类型的数据。 Imagine a deep-learning model trying to predict the most likely market price of a second-hand piece of clothing, using the following inputs: user-provided metadata (such as the item's brand, age, and so on), a user-provided text description, and a picture of the item. 想象一下一个深度学习模型，该模型尝试使用以下输入来预测二手服装的最可能市场价格：用户提供的元数据（例如商品的品牌，年龄等），用户提供的元数据文字说明和图片。 If you had only the metadata available, you could one-hot encode it and use a densely connected network to predict the price. 如果只有元数据可用，则可以对其进行一次热编码，然后使用密集连接的网络来预测价格。 If you had only the text description available, you could use an RNN or a 1D convnet. 如果只有文本描述可用，则可以使用RNN或一维convnet。 If you had only the picture, you could use a 2D convnet. 如果只有图片，则可以使用2D卷积网络。 But how can you use all three at the same time? 但是如何同时使用这三个呢？ A naive approach would be to train three separate models and then do a weighted average of their predictions. 天真的方法是训练三个单独的模型，然后对它们的预测进行加权平均。 But this may be suboptimal, because the information extracted by the models may be redundant. 但这可能不是最优的，因为模型提取的信息可能是多余的。 A better way is to jointly learn a more accurate model of the data by using a model that can see all available input modalities simultaneously: a model with three input branches. 更好的方法是通过使用可以同时查看所有可用输入模式的模型来共同学习更准确的数据模型：具有三个输入分支的模型。

Answer 2

I think the Concatenate functionality is the way to get in such a case and the general idea should be as follows. 我认为在这种情况下可以使用Concatenate功能，并且总体思路如下。 Please tweak it according to your use case. 请根据您的用例进行调整。

### whatever preprocessing you may want to do
text_input = Input(shape=(1, ))
thing_input = Input(shape=(1,))

### now bring them together
merged_inputs = Concatenate(axis = 1)([text_input, thing_input])

### sample output layer
output = Dense(3)(merged_inputs)

### pass your inputs and outputs to the model
model = Model(inputs = [text_input, thing_input], outputs = output)

Answer 3

You have to take multiple column as list and then merge to train after embedding and pre processing on the raw data.您必须将多列作为列表，然后在对原始数据进行嵌入和预处理后合并进行训练。 Example:例子：

train = pd.read_csv('COVID19 multifeature Emotion - 50 data.csv', nrows=49)
# This dataset has two text column field and different class level

X_train_doctor_opinion = train["doctor-opinion"].str.lower()
X_train_patient_opinion = train["patient-opinion"].str.lower()

X_train = list(X_train_doctor_opinion) + list(X_train_patient_opinion))

Then pre process and embed然后预处理和嵌入

如何在 2 列上训练 ML 模型以解决分类问题？

问题描述

Problem case(say):问题案例（说）：

3 个解决方案

解决方案1
1 已采纳 2019-08-14 06:28:02

解决方案2
1 2019-08-14 06:53:53

解决方案3
0 2021-06-19 16:50:58

如何在 2 列上训练 ML 模型以解决分类问题？

问题描述

Problem case(say):问题案例（说）：

3 个解决方案

解决方案1 1 已采纳 2019-08-14 06:28:02

解决方案2 1 2019-08-14 06:53:53

解决方案3 0 2021-06-19 16:50:58

解决方案1
1 已采纳 2019-08-14 06:28:02

解决方案2
1 2019-08-14 06:53:53

解决方案3
0 2021-06-19 16:50:58