简体   繁体   English

使用隐马尔可夫模型进行情感分析

[英]Sentiment analysis using Hidden Markov Model

I have a list of reviews, each element of the list is a review of IMDB data set in kaggle.我有一个评论列表,列表的每个元素都是对 kaggle 中 IMDB 数据集的评论。 there are 25000 reviews in total.总共有 25000 条评论。 I have the label of each review +1 for positive and -1 for negative.我有每条评论的标签 +1 表示正面,-1 表示负面。

I want to train a Hidden Markov Model with these reviews and labels.我想用这些评论和标签训练一个隐马尔可夫模型。

1- what is the sequence that I should give to HMM? 1-我应该给 HMM 的顺序是什么? is it something like Bag of words or is it something else like probabilities which I need to calculate?它是像 Bag of words 还是像我需要计算的概率之类的东西? what kind of feature extraction method is appropriate?什么样的特征提取方法比较合适? I was told to use Bag of words on review's list, but when I searched a little I find out HMM cares about the order but bag of words doesn't maintain the order of words in sequences.有人告诉我在评论列表中使用 Bag of words ,但是当我稍微搜索时,我发现 HMM 关心顺序,但 bag of words 不保持单词的顺序。 how should I prepare this List of reviews to be able to feed it into a HMM model?我应该如何准备这个评论列表才能将它输入到 HMM 模型中?

2- is there a framework for this? 2-是否有一个框架? I know hmmlearn, and I think I should use the MultinomialHMM, correct me if I'm wrong.我知道 hmmlearn,我想我应该使用 MultinomialHMM,如果我错了,请纠正我。 but it is not supervised, its models do not take labels as input when i want to train it, and I get some funny errors which I don't know how to solve because of the first question I asked about the correct type of input I should give to it.但它不受监督,当我想训练它时,它的模型不会将标签作为输入,并且我遇到了一些有趣的错误,我不知道如何解决,因为我问的关于正确输入类型的第一个问题是我应该给它。 seqlearn is the one I find recently, is it good or there is a better one to use? seqlearn 是我最近找到的,好用还是有更好用的?

I appreciate any guidance since I have almost zero knowledge about NLP.我感谢任何指导,因为我对 NLP 的了解几乎为零。

I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation.我能够以惊人的准确度做到这一点,但我不确定到底发生了什么,我使用了seqlearn框架,它有一个悲伤的文档。 I really suggest to use MATLAB instead of python for HMM.我真的建议对 HMM 使用MATLAB而不是 python。

I used sklearn TfidfVectorizer for feature extraction, then I did this:我使用sklearn TfidfVectorizer进行特征提取,然后我这样做了:

vectorizer = TfidfVectorizer(norm=None)
x_train = vectorizer.fit_transform(train_review)
x_test = vectorizer.transform(test_review)

len_train_seq = np.array([[1,1]]*(len(train_review)/2))
len_test_seq = np.array([1]*len(test_review))

model = seqlearn.hmm.MultinomialHMM()
HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)
y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)

I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.如果一位了解 HMM 的知识渊博的人提供关于使用 HMM 进行情感分析的更强大和清晰的指南,我仍然会很感激。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM