简体   繁体   English

在Spark 2+中通过SparkSession向Kryo注册类

[英]Registering Classes with Kryo via SparkSession in Spark 2+

I'm migrating from Spark 1.6 to 2.3. 我正在从Spark 1.6迁移到2.3。

I need to register custom classes with Kryo. 我需要用Kryo注册自定义类。 So what I see here: https://spark.apache.org/docs/2.3.1/tuning.html#data-serialization 所以我在这里看到: https//spark.apache.org/docs/2.3.1/tuning.html#data-serialization

val conf = new SparkConf().setMaster(...).setAppName(...)
conf.registerKryoClasses(Array(classOf[MyClass1], classOf[MyClass2]))
val sc = new SparkContext(conf)

The problem is... everywhere else in Spark 2+ instructions, it indicates that SparkSession is the way to go for everything... and if you need SparkContext it should be through spark.sparkContext and not as a stand-alone val. 问题是......在Spark 2+指令的其他地方,它表明SparkSession是一切的方法......如果你需要SparkContext它应该通过spark.sparkContext而不是作为一个独立的val。

So now I use the following (and have wiped any trace of conf, sc, etc. from my code)... 所以现在我使用以下内容(并从我的代码中删除了任何conf,sc等的痕迹)......

val spark = SparkSession.builder.appName("myApp").getOrCreate()

My question : where do I register classes with Kryo if I don't use SparkConf or SparkContext directly? 我的问题 :如果我不直接使用SparkConfSparkContext我在哪里注册Kryo的类?

I see spark.kryo.classesToRegister here: https://spark.apache.org/docs/2.3.1/configuration.html#compression-and-serialization 我在这里看到spark.kryo.classesToRegisterhttpsspark.kryo.classesToRegister

I have a pretty extensive conf.json to set spark-defaults.conf , but I want to keep it generalizable across apps, so I don't want to register classes here. 我有一个非常广泛的conf.json来设置spark-defaults.conf ,但是我想让它在各个应用程序中保持一致,所以我不想在这里注册类。

When I look here: https://spark.apache.org/docs/2.3.1/api/scala/index.html#org.apache.spark.sql.SparkSession 当我看到这里: https//spark.apache.org/docs/2.3.1/api/scala/index.html#org.apache.spark.sql.SparkSession

It makes me think I can do something like the following to augment my spark-defaults.conf : 它让我觉得我可以做类似以下的事情来增强我的spark-defaults.conf

val spark = 
  SparkSession
    .builder
    .appName("myApp")
    .config("spark.kryo.classesToRegister", "???")
    .getOrCreate()

But what is ??? 但是什么是??? if I want to register org.myorg.myapp.{MyClass1, MyClass2, MyClass3} ? 如果我想注册org.myorg.myapp.{MyClass1, MyClass2, MyClass3} I can't find an example of this use. 我找不到这种用法的例子。

Would it be: 可不可能是:

.config("spark.kryo.classesToRegister", "MyClass1,MyClass2,MyClass3")

or 要么

.config("spark.kryo.classesToRegister", "class org.myorg.mapp.MyClass1,class org.myorg.mapp.MyClass2,class org.myorg.mapp.MyClass3")

or something else? 或者是其他东西?

EDIT 编辑

when I try testing different formats in spark-shell via spark.conf.set("spark.kryo.classesToRegister", "any,any2,any3") i never get any error messages no matter what I put in the string any,any2,any3 . 当我尝试通过spark.conf.set("spark.kryo.classesToRegister", "any,any2,any3")测试spark-shell中的不同格式时,无论我在字符串中放入any,any2,any3我都不会收到任何错误消息any,any2,any3

I tried making any each of the following formats 我试图使any每个以下格式

  • "org.myorg.myapp.myclass" “org.myorg.myapp.myclass”
  • "myclass" “我的课”
  • "class org.myorg.myapp.myclass" “class org.myorg.myapp.myclass”

I can't tell if any of these successfully registered anything. 我不知道是否有任何成功注册的东西。

Have you tried the following, it should work since it actually a part of the SparkConf API and I think the only thing missing is that you just need to plug it into the SparkSession : 您是否尝试了以下内容,它应该可以工作,因为它实际上是SparkConf API的一部分,我认为唯一缺少的是您只需将其插入SparkSession

  private lazy val sparkConf = new SparkConf()
    .setAppName("spark_basic_rdd").setMaster("local[*]").registerKryoClasses(...)
  private lazy val sparkSession = SparkSession.builder()
    .config(sparkConf).getOrCreate()

And if you need a Spark Context you can call: private lazy val sparkContext: SparkContext = sparkSession.sparkContext 如果你需要Spark Context,你可以调用: private lazy val sparkContext: SparkContext = sparkSession.sparkContext

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM