简体   繁体   English

如何在 2 列上训练 ML 模型以解决分类问题?

[英]How to train ML model on 2 columns to solve for classification?

I have three columns in a dataset on which I'm doing sentiment analysis(classes 0 , 1 , 2 ):我在数据集中有三列我正在做情绪分析(类012 ):

text    thing    sentiment

But the problem is that I can train my data only on either text or thing and get predicted sentiment .但问题是我只能在textthing上训练我的数据并获得预测的sentiment Is there a way to train the data both on text & thing and then predict sentiment ?有没有办法在textthing上训练数据然后预测sentiment

Problem case(say):问题案例(说):

  |text  thing  sentiment
0 | t1   thing1    0
. |
. |
54| t1   thing2    2

This example tells us that sentiment shall depend on the thing as well.这个例子告诉我们,情绪也取决于thing If I try to concatenate the two columns one below the other and then try but that would be incorrect as we wouldn't be giving any relationship between the two columns to the model.如果我尝试将两列一个接一个地连接起来,然后尝试,但这将是不正确的,因为我们不会将两列之间的任何关系提供给模型。

Also my test set contains two columns test and thing for which I've to predict the sentiment according to the trained model on the two columns.此外,我的测试集包含两列testthing ,我必须根据两列上的训练模型预测情绪。

Right now I'm using the tokenizer and then the model below:现在我正在使用分tokenizer ,然后是下面的模型:

model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

Any pointers on how to proceed or which model or coding manipulation to use ?关于如何进行或使用哪种模型或编码操作的任何指示?

You may want to shift to the Keras functionnal API and train a multi-input model. 您可能需要切换到Keras功能API并训练多输入模型。

According to Keras's creator, François CHOLLET, in his book Deep Learning with Python [Manning, 2017] (chapter 7, section 1) : 根据Keras的创建者FrançoisCHOLLET在他的《 用Python进行深度学习》一书中的描述[Manning,2017] (第7章,第1节):

Some tasks, require multimodal inputs: they merge data coming from different input sources, processing each type of data using different kinds of neural layers. 一些任务需要多模式输入:它们合并来自不同输入源的数据,使用不同种类的神经层处理每种类型的数据。 Imagine a deep-learning model trying to predict the most likely market price of a second-hand piece of clothing, using the following inputs: user-provided metadata (such as the item's brand, age, and so on), a user-provided text description, and a picture of the item. 想象一下一个深度学习模型,该模型尝试使用以下输入来预测二手服装的最可能市场价格:用户提供的元数据(例如商品的品牌,年龄等),用户提供的元数据文字说明和图片。 If you had only the metadata available, you could one-hot encode it and use a densely connected network to predict the price. 如果只有元数据可用,则可以对其进行一次热编码,然后使用密集连接的网络来预测价格。 If you had only the text description available, you could use an RNN or a 1D convnet. 如果只有文本描述可用,则可以使用RNN或一维convnet。 If you had only the picture, you could use a 2D convnet. 如果只有图片,则可以使用2D卷积网络。 But how can you use all three at the same time? 但是如何同时使用这三个呢? A naive approach would be to train three separate models and then do a weighted average of their predictions. 天真的方法是训练三个单独的模型,然后对它们的预测进行加权平均。 But this may be suboptimal, because the information extracted by the models may be redundant. 但这可能不是最优的,因为模型提取的信息可能是多余的。 A better way is to jointly learn a more accurate model of the data by using a model that can see all available input modalities simultaneously: a model with three input branches. 更好的方法是通过使用可以同时查看所有可用输入模式的模型来共同学习更准确的数据模型:具有三个输入分支的模型。

I think the Concatenate functionality is the way to get in such a case and the general idea should be as follows. 我认为在这种情况下可以使用Concatenate功能,并且总体思路如下。 Please tweak it according to your use case. 请根据您的用例进行调整。

### whatever preprocessing you may want to do
text_input = Input(shape=(1, ))
thing_input = Input(shape=(1,))

### now bring them together
merged_inputs = Concatenate(axis = 1)([text_input, thing_input])

### sample output layer
output = Dense(3)(merged_inputs)

### pass your inputs and outputs to the model
model = Model(inputs = [text_input, thing_input], outputs = output)

You have to take multiple column as list and then merge to train after embedding and pre processing on the raw data.您必须将多列作为列表,然后在对原始数据进行嵌入和预处理后合并进行训练。 Example:例子:

train = pd.read_csv('COVID19 multifeature Emotion - 50 data.csv', nrows=49)
# This dataset has two text column field and different class level

X_train_doctor_opinion = train["doctor-opinion"].str.lower()
X_train_patient_opinion = train["patient-opinion"].str.lower()

X_train = list(X_train_doctor_opinion) + list(X_train_patient_opinion))

Then pre process and embed然后预处理和嵌入

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从多个列创建目标以训练分类 model? - How to create a target from multiple columns to train a classification model? 在Azure ML上开始训练模型 - Start train model on Azure ML 如何训练 ML 模型以获得多个可能的输出? - How to train ML model to get more than one possible output? 如何使用 tf.data 管道训练图像分类模型? - How to train image classification model using tf.data pipeline? 如何使用 Glove 和 CNN 配置和训练模型进行文本分类? - How to configure and train the model using Glove and CNN for text classification? 删除并包含 DataFrame 中的所有可能组合中的列(训练 ML 模型),除了一列(目标列) - Drop and include columns in a DataFrame in all possible combinations(To train a ML model) except one column(target column) 有没有办法在多台笔记本电脑上训练 ML model? - is there a way to train a ML model on multiple laptops? 使用不平衡数据集进行分类 ML 模型训练 - Classification ML Model Training with Unbalanced Dataset 如何解决尝试训练模型时出现的这个错误? - How can I solve this error that arises when trying to train a model? 如何训练大型数据集进行分类 - How to train large Dataset for classification
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM