![](/img/trans.png)
[英]Exception while running hive support on Spark: Unable to instantiate SparkSession with Hive support because Hive classes are not found
[英]Why there is no support for sparkSession with namedObject in spark job server?
我正在嘗試使用Spark Job Server API(適用於Spark 2.2.0)構建應用程序。 但是我發現sparkSession不支持namedObject。 我的樣子:
import com.typesafe.config.Config
import org.apache.spark.sql.SparkSession
import org.apache.spark.storage.StorageLevel
import org.scalactic._
import spark.jobserver.{NamedDataFrame, NamedObjectSupport, SparkSessionJob}
import spark.jobserver.api.{JobEnvironment, SingleProblem, ValidationProblem}
import scala.util.Try
object word1 extends SparkSessionJob with NamedObjectSupport {
type JobData = Seq[String]
type JobOutput = String
def runJob(sparkSession: SparkSession, runtime: JobEnvironment, data: JobData): JobOutput =
{
val df = sparkSession.sparkContext.parallelize(data)
val ndf = NamedDataFrame(df, true, StorageLevel.MEMORY_ONLY)
this.namedObjects.update("df1", ndf)
this.namedObjects.getNames().toString
}
def validate(sparkSession: SparkSession, runtime: JobEnvironment, config: Config):
JobData Or Every[ValidationProblem] = {
Try(config.getString("input.string").split(" ").toSeq)
.map(words => Good(words))
.getOrElse(Bad(One(SingleProblem("No input.string param"))))
}
}
但是在this.namedObjects.update()行有錯誤。 我認為他們不支持namedObject。 使用SparkJob編譯相同的代碼時:
object word1 extends SparkJob with NamedObjectSupport
sparksession是否支持namedObjects? 如果不是,那么解決持久化數據幀/數據集的方法是什么?
我想到了。 我這是一個愚蠢的錯誤。 來自https://github.com/spark-jobserver/spark-jobserver/blob/master/job-server-api/src/main/scala/spark/jobserver/NamedObjectSupport.scala#L138 。 如它所說:
//由於api.SparkJobBase中的JobEnvironment,不再需要NamedObjectSupport。 它還會//自動導入到舊的spark.jobserver.SparkJobBase中以實現兼容性。
@Deprecated
trait NamedObjectSupport
因此,要訪問這些功能,我們需要將此代碼修改為:
import com.typesafe.config.Config
import org.apache.spark.sql.SparkSession
import org.apache.spark.storage.StorageLevel
import org.scalactic._
import spark.jobserver.{NamedDataFrame, NamedObjectSupport, SparkSessionJob}
import spark.jobserver.api.{JobEnvironment, SingleProblem, ValidationProblem}
import scala.util.Try
object word1 extends SparkSessionJob with NamedObjectSupport {
type JobData = Seq[String]
type JobOutput = String
def runJob(sparkSession: SparkSession, runtime: JobEnvironment, data: JobData): JobOutput =
{
val df = sparkSession.sparkContext.parallelize(data)
val ndf = NamedDataFrame(df, true, StorageLevel.MEMORY_ONLY)
runtime.namedObjects.update("df1", ndf)
runtime.namedObjects.getNames().toString
}
def validate(sparkSession: SparkSession, runtime: JobEnvironment, config: Config):
JobData Or Every[ValidationProblem] = {
Try(config.getString("input.string").split(" ").toSeq)
.map(words => Good(words))
.getOrElse(Bad(One(SingleProblem("No input.string param"))))
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.