简体   繁体   English

使用 spark-shell 时使用 sparkConf.set(..) 自定义 SparkContext

[英]Customize SparkContext using sparkConf.set(..) when using spark-shell

In Spark, there are 3 primary ways to specify the options for the SparkConf used to create the SparkContext :在 Spark 中,有 3 种主要方法可以为用于创建SparkContextSparkConf指定选项:

  1. As properties in the conf/spark-defaults.conf作为 conf/spark-defaults.conf 中的属性
    • eg, the line: spark.driver.memory 4g例如,该行: spark.driver.memory 4g
  2. As args to spark-shell or spark-submit作为 spark-shell 或 spark-submit 的参数
    • eg, spark-shell --driver-memory 4g ...例如, spark-shell --driver-memory 4g ...
  3. In your source code, configuring a SparkConf instance before using it to create the SparkContext :在您的源代码中,在使用SparkConf实例创建SparkContext之前配置它:
    • eg, sparkConf.set( "spark.driver.memory", "4g" )例如, sparkConf.set( "spark.driver.memory", "4g" )

However, when using spark-shell , the SparkContext is already created for you by the time you get a shell prompt, in the variable named sc .但是,在使用spark-shell ,在您收到 shell 提示时,已经在名为sc的变量中为您创建了 SparkContext。 When using spark-shell, how do you use option #3 in the list above to set configuration options, if the SparkContext is already created before you have a chance to execute any Scala statements?使用 spark-shell 时,如果在您有机会执行任何 Scala 语句之前已经创建了 SparkContext,您如何使用上面列表中的选项 #3 来设置配置选项?

In particular, I am trying to use Kyro serialization and GraphX.特别是,我正在尝试使用 Kyro 序列化和 GraphX。 The prescribed way to use Kryo with GraphX is to execute the following Scala statement when customizing the SparkConf instance:将 Kryo 与 GraphX 结合使用的规定方法是在自定义SparkConf实例时执行以下 Scala 语句:

GraphXUtils.registerKryoClasses( sparkConf )

How do I accomplish this when running spark-shell ?运行spark-shell时如何完成此操作?

Spark 2.0+火花 2.0+

You should be able to use SparkSession.conf.set method to set some configuration option on runtime but it is mostly limited to SQL configuration.您应该能够使用SparkSession.conf.set方法在运行时设置一些配置选项,但它主要限于 SQL 配置。

Spark < 2.0火花 < 2.0

You can simply stop an existing context and create a new one:您可以简单地停止现有的上下文并创建一个新的上下文:

import org.apache.spark.{SparkContext, SparkConf}

sc.stop()
val conf = new SparkConf().set("spark.executor.memory", "4g")
val sc = new SparkContext(conf)

As you can read in the official documentation :正如您在官方文档中所读到的:

Once a SparkConf object is passed to Spark, it is cloned and can no longer be modified by the user.一旦 SparkConf 对象被传递给 Spark,它就会被克隆并且不能再被用户修改。 Spark does not support modifying the configuration at runtime. Spark 不支持在运行时修改配置。

So as you can see stopping the context it is the only applicable option once shell has been started.因此,如您所见,停止上下文是 shell 启动后唯一适用的选项。

You can always use configuration files or --conf argument to spark-shell to set required parameters which will be used be the default context.您始终可以使用配置文件或spark-shell --conf参数来设置将用作默认上下文的必需参数。 In case of Kryo you should take a look at:如果是 Kryo,你应该看看:

  • spark.kryo.classesToRegister
  • spark.kryo.registrator

See Compression and Serialization in Spark Configuration .请参阅Spark 配置中的压缩和序列化

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM