[英]using stanford nlp in spark, error “ Class java.util.function.Function not found - continuing with a stub.”
I need to do some text preprocessing in spark 1.6. 我需要在spark 1.6中进行一些文本预处理。 taking the answer from Simplest method for text lemmatization in Scala and Spark , it's required to
import java.util.Properties
. 借鉴Scala和Spark中用于文本词法生成的最简单方法的答案,需要
import java.util.Properties
。 But by running abt compiling and assembling, I got the following error: 但是通过运行abt编译和组装,我得到了以下错误:
[warn] Class java.util.function.Function not found - continuing with a stub.
[warn] Class java.util.function.Function not found - continuing with a stub.
[warn] Class java.util.function.Function not found - continuing with a stub.
[error] Class java.util.function.Function not found - continuing with a stub.
[error] Class java.util.function.Function not found - continuing with a stub.
[warn] four warnings found
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 52 s, completed Feb 10, 2016 2:11:12 PM
The code is as follows: 代码如下:
// ref https://stackoverflow.com/questions/30222559/simplest-methodfor-text-lemmatization-in-scala-and-spark?rq=1
def plainTextToLemmas(text: String): Seq[String] = {
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations._
import edu.stanford.nlp.pipeline._
import scala.collection.JavaConversions._
import scala.collection.mutable.ArrayBuffer
// val stopWords = Set("stopWord")
val props = new Properties()
props.put("annotators", "tokenize, ssplit, pos, lemma")
val pipeline = new StanfordCoreNLP(props)
val doc = new Annotation(text)
pipeline.annotate(doc)
val lemmas = new ArrayBuffer[String]()
val sentences = doc.get(classOf[SentencesAnnotation])
for (sentence <- sentences;
token <- sentence.get(classOf[TokensAnnotation])) {
val lemma = token.get(classOf[LemmaAnnotation])
if (lemma.length > 2) {
lemmas += lemma.toLowerCase
}
}
lemmas
}
My sbt file is as follows: 我的sbt文件如下:
scalaVersion := "2.11.7"
crossScalaVersions := Seq("2.10.5", "2.11.0-M8")
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "1.6.0" % "provided",
"org.apache.spark" % "spark-mllib_2.10" % "1.6.0" % "provided",
"org.apache.spark" % "spark-sql_2.10" % "1.6.0" % "provided",
"com.github.scopt" % "scopt_2.10" % "3.3.0",
)
libraryDependencies ++= Seq(
"edu.stanford.nlp" % "stanford-corenlp" % "3.5.2",
"edu.stanford.nlp" % "stanford-corenlp" % "3.5.2" classifier "models"
// "edu.stanford.nlp" % "stanford-corenlp" % "3.5.2" classifier "models-chinese"
// "edu.stanford.nlp" % "stanford-corenlp" % "3.5.2" classifier "models-german"
// "edu.stanford.nlp" % "stanford-corenlp" % "3.5.2" classifier "models-spanish"
//"com.google.code.findbugs" % "jsr305" % "2.0.3"
)
taking the suggestion from the site, I changed java lib version from 1.7 to 1.8, the problem is still there. 根据网站的建议,我将Java lib版本从1.7更改为1.8,问题仍然存在。
通过将java home设置为java 8可以解决此问题。以前,我将项目SDK更改为java 8,而java home仍然是7,因此在sbt编译时它无法工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.