使用bigrams进行情感分析

Question

因此，我有一些评论试图将其归类为正面或负面。 我正在尝试使用NLTK和Stanford coreNLP来这样做。 我能够在unigram上做到这一点，但不适用于bigrams。 我尝试了以下二元组

def classifySentence(sen):
  wn_lem = WordNetLemmatizer()
  pos = 0
  neg = 0
  stop_words = set(stopwords.words('english'))
  filtered_review = [token for token in nltk.word_tokenize(sen) if not token in stop_words]


  for token in nltk.bigrams(filtered_review):
      #lemma = wn_lem.lemmatize(token)
      # print("lemma="+token)
      if len(wn.synsets(token))>0:
          synset = wn.synsets(token)[0]
          #print("synset.name="+synset.name())

          sent = swn.senti_synset(synset.name())

          #print("Sentiment of "+token+" "+str(sent))

          pos = pos + sent.pos_score()
          neg = neg + sent.neg_score()
          # print (token + "(pos_score): " + str(pos) +"\n")
          # print (token + "(neg_score): " + str(neg) +"\n")
  #print (filtered_review)
  JoinedTokens = ' '.join(wo for wo in filtered_review)
  return [JoinedTokens, pos, neg]

我想知道是否有人可以建议我这样做。 我想使用NLTK或也可以使用stanfordcoreNLP。 我也愿意使用其他python软件包，但只需要一些指导，我已经编写了一些使用它的代码，但是它也不起作用。 我写的代码

def StanfordBigrams():
  nlp = StanfordCoreNLP('http://localhost:9000')
  operations = {'annotators': 'tokenize,lemma,pos,sentiment', 'outputFormat': 'json'}
  string = "not bad"
  tok = nltk.word_tokenize(string)
  bigrams = nltk.bigrams(tok)
  res = nlp.annotate(str(bigrams),operations)
  for s in res["sentences"]: 
          for token in s["tokens"]:
              print("Sentiment: "+str(s["sentiment"])+"SentimentValue: "+str(s["sentimentValue"]))
              print (token)

如果有人能指出正确的方向，我将不胜感激。

Answer 1

您是在训练情绪分类器，还是只是尝试使用分类器？ 从技术上讲，我怀疑您的错误出在wn.synset(bigram) –我怀疑nltk.bigrams返回的nltk.bigrams是否可以传递给WordNet。

但是，更重要的是，您可能希望将整个句子传递到情感分类器中-双语法例不会在SentiWordNet之类的东西上标注情感，而经过训练的情感分类器将在句子上花费很多时间而不是短片。 您应该能够从斯坦福大学的情感树中获得句子中某些二元组的情感（相对于根本上的情感值）。 请参阅CoreNLP服务器的JSON输出上的sentimentTree字段。

使用bigrams进行情感分析

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-02-26 23:26:01

使用bigrams进行情感分析

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-02-26 23:26:01

解决方案1
0 已采纳 2018-02-26 23:26:01