简体   繁体   中英

spark Cassandra tuning

How to set following Cassandra write parameters in spark scala code for version - DataStax Spark Cassandra Connector 1.6.3 .

Spark version - 1.6.2

spark.cassandra.output.batch.size.rows

spark.cassandra.output.concurrent.writes

spark.cassandra.output.batch.size.bytes

spark.cassandra.output.batch.grouping.key

Thanks, Chandra

In DataStax Spark Cassandra Connector 1.6.X, you can pass these parameters as part of your SparkConf .

val conf = new SparkConf(true)
    .set("spark.cassandra.connection.host", "192.168.123.10")
    .set("spark.cassandra.auth.username", "cassandra")            
    .set("spark.cassandra.auth.password", "cassandra")
    .set("spark.cassandra.output.batch.size.rows", "100")            
    .set("spark.cassandra.output.concurrent.writes", "100")
    .set("spark.cassandra.output.batch.size.bytes", "100")            
    .set("spark.cassandra.output.batch.grouping.key", "partition")

val sc = new SparkContext("spark://192.168.123.10:7077", "test", conf)

You can refer to this readme for more information.

The most flexible way is to add those variables in a file, such as spark.conf :

spark.cassandra.output.concurrent.writes 10

etc... and then create your spark context in your app with something like:

val conf = new SparkConf()
val sc = new SparkContext(conf)

and finally, when you submit your app, you can specify your properties file with:

spark-submit --properties-file spark.conf ...

Spark will automatically read your configuration from spark.conf when creating the spark context That way, you can modify the properties on your spark.conf without needing to recompile your code each time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM