使用pyspark进行情感分析

Question

由于我是pyspark ，因此有人可以帮助pyspark实现情感分析 。 我已经完成了Python实现。 谁能告诉我要进行哪些更改？

import nltk
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
from nltk.classify import NaiveBayesClassifier

def format_sentence(sent):
  return({word: True for word in nltk.word_tokenize(sent)})

#print(format_sentence("The cat is very cute"))

pos = []
with open("./pos_tweets.txt") as f:
for i in f: 
    pos.append([format_sentence(i), 'pos'])

neg = []
with open("./neg_tweets.txt") as fp:
for i in fp: 
    neg.append([format_sentence(i), 'neg'])

# next, split labeled data into the training and test data
training = pos[:int((.8)*len(pos))] + neg[:int((.8)*len(neg))]
test = pos[int((.8)*len(pos)):] + neg[int((.8)*len(neg)):]

classifier = NaiveBayesClassifier.train(training)

example1 = "no!"

print(classifier.classify(format_sentence(example1)))

Answer 1

该模式通常为：

将您的数据转换为DataFrame
df = spark.read.csv('./neg_tweets.txt')
您可以在此处使用训练/测试拆分：
df.randomSplit([0.8, 0.2])
找到一个合适的模型：如果naive bayes为您工作，它将看起来像这样
import org.apache.spark.mllib.classification.{NaiveBayes, NaiveBayesModel}

否则，对于sentiment analysis ， spark.ml/mllib可能没有精确内置的spark.ml/mllib 。 您可能需要寻找外部项目。
- 迭代，迭代模型和调整参数。
- 您可以针对您认为对您的问题很重要的指标运行evaluator程序。 binary classification问题的一些示例在这里：

https://spark.apache.org/docs/2.2.0/mllib-evaluation-metrics.html#binary-classification

metrics = BinaryClassificationMetrics(predictionAndLabels)

使用pyspark进行情感分析

问题描述

1 个解决方案

解决方案1
0 2018-04-02 02:31:19

使用pyspark进行情感分析

问题描述

1 个解决方案

解决方案1 0 2018-04-02 02:31:19

解决方案1
0 2018-04-02 02:31:19