Scala-Spark-corenlp-java.lang.NoClassDefFoundError

Question

I want to run spark-coreNLP example , but I get an java.lang.NoClassDefFoundError error when running spark-submit. 我想运行spark-coreNLP 示例，但是运行spark-submit时出现java.lang.NoClassDefFoundError错误。

Here is the scala code, from the github example, which I put into an object, and defined a SparkContext and SQLContext 这是github示例中的scala代码，我将其放入对象中，并定义了SparkContext和SQLContext

main.scala.Sentiment.scala main.scala.Sentiment.scala

package main.scala


import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SQLContext

import com.databricks.spark.corenlp.functions._


object SQLContextSingleton {

  @transient  private var instance: SQLContext = _

  def getInstance(sparkContext: SparkContext): SQLContext = {
    if (instance == null) {
      instance = new SQLContext(sparkContext)
    }
    instance
  }
}


object Sentiment {
  def main(args: Array[String]) {

    val conf = new SparkConf().setAppName("Sentiment")
    val sc = new SparkContext(conf)
    val sqlContext = SQLContextSingleton.getInstance(sc)
    import sqlContext.implicits._ 


    val input = Seq((1, "<xml>Stanford University is located in California. It is a great university.</xml>")).toDF("id", "text")

    val output = input
      .select(cleanxml('text).as('doc))
      .select(explode(ssplit('doc)).as('sen))
      .select('sen, tokenize('sen).as('words), ner('sen).as('nerTags), sentiment('sen).as('sentiment))

    output.show(truncate = false)
  }
}

And my build.sbt (modified from here ) 和我的build.sbt（从这里修改）

version := "1.0"

scalaVersion := "2.10.6"

scalaSource in Compile := baseDirectory.value / "src"

initialize := {
  val _ = initialize.value
  val required = VersionNumber("1.8")
  val current = VersionNumber(sys.props("java.specification.version"))
  assert(VersionNumber.Strict.isCompatible(current, required), s"Java $required required.")
}

sparkVersion := "1.5.2"

// change the value below to change the directory where your zip artifact will be created
spDistDirectory := target.value

sparkComponents += "mllib"

// add any sparkPackageDependencies using sparkPackageDependencies.
// e.g. sparkPackageDependencies += "databricks/spark-avro:0.1"
spName := "databricks/spark-corenlp"

licenses := Seq("GPL-3.0" -> url("http://opensource.org/licenses/GPL-3.0"))

resolvers += Resolver.mavenLocal


libraryDependencies ++= Seq(
  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models",
  "com.google.protobuf" % "protobuf-java" % "2.6.1"
)

I run sbt package without issue, then run Spark with 我运行sbt package没有问题，然后运行Spark

spark-submit --class "main.scala.Sentiment" --master local[4] target/scala-2.10/sentimentanalizer_2.10-1.0.jar

The program fails after throwing an exception: 引发异常后，程序失败：

Exception in thread "main" java.lang.NoClassDefFoundError: edu/stanford/nlp/simple/Sentence
    at main.scala.com.databricks.spark.corenlp.functions$$anonfun$cleanxml$1.apply(functions.scala:55)
    at main.scala.com.databricks.spark.corenlp.functions$$anonfun$cleanxml$1.apply(functions.scala:54)
    at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:75)
    at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:74)

Things I tried: 我尝试过的事情：

I work with Eclipse for Scala, and I made sure to add all the jars from stanford-corenlp as suggested here 我使用Eclipse for Scala，并且确保按照此处的建议添加stanford-corenlp的所有jar。

./stanford-corenlp/ejml-0.23.jar
./stanford-corenlp/javax.json-api-1.0-sources.jar
./stanford-corenlp/javax.json.jar
./stanford-corenlp/joda-time-2.9-sources.jar
./stanford-corenlp/joda-time.jar
./stanford-corenlp/jollyday-0.4.7-sources.jar
./stanford-corenlp/jollyday.jar
./stanford-corenlp/protobuf.jar
./stanford-corenlp/slf4j-api.jar
./stanford-corenlp/slf4j-simple.jar
./stanford-corenlp/stanford-corenlp-3.6.0-javadoc.jar
./stanford-corenlp/stanford-corenlp-3.6.0-models.jar
./stanford-corenlp/stanford-corenlp-3.6.0-sources.jar
./stanford-corenlp/stanford-corenlp-3.6.0.jar
./stanford-corenlp/xom-1.2.10-src.jar
./stanford-corenlp/xom.jar

I suspect that I need to add something to my command line when submitting the job to Spark, any thoughts? 我怀疑在将作业提交给Spark时需要在命令行中添加一些内容，有什么想法吗？

Answer 1

I was on the right track that my command line was missing something. 我在正确的轨道上知道我的命令行缺少某些内容。

spark-submit needs to have all the stanford-corenlp added: spark-submit需要添加所有stanford-corenlp：

    spark-submit 
--jars $(echo stanford-corenlp/*.jar | tr ' ' ',') 
--class "main.scala.Sentiment" 
--master local[4] target/scala-2.10/sentimentanalizer_2.10-1.0.jar

Scala-Spark-corenlp-java.lang.NoClassDefFoundError

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-06-23 17:42:46

Scala-Spark-corenlp-java.lang.NoClassDefFoundError

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-06-23 17:42:46

解决方案1
2 已采纳 2016-06-23 17:42:46