简体繁体中英

Sharing data across executors in Apache spark

原文 2018-12-18 04:51:01 6 1 java/ apache-spark/ apache-spark-dataset/ apache-spark-2.0/ apache-spark-2.3

My SPARK project (written in Java) requires to access (SELECT query results) different tables across executors.

One solution to this problem is :

I create a tempView
select required columns
using forEach convert DataFrame to Map .
pass that map as a broadcast variable across executors.

However, I have found that

there many complex queries whose result cant be stored directly in Map
Tables are very large and hence creating Map of large size and passing it to executors as a broadcast variable doesn't sound efficient.

Instead can we load tables in-memory using load which can be shared across executors?

Is void org.apache.spark.sql.Dataset.createOrReplaceTempView(String viewName)

or void org.apache.spark.sql.Dataset.createGlobalTempView(String viewName) throws AnalysisException

Method useful for this purpose?

SPARK VERSION : 2.3.0

1 answers

You can broadcast a DataFrame. See documentation

Executors and cores in Apache Spark

Sharing Zookeeper configuration on multiple Spark Executors

Sharing data across 2 classes

In Spark, is it possible to share data between two executors?

Apache Spark take Action on Executors in fully distributed mode

Sharing ApplicationContext with Executors

Sharing data across Spring MVC Controller methods

Sharing data across multiple instances of a java application

Sharing JMS deadLetterChannel across direct route with Transaction in Apache Camel

Spark OutOfMemoryError when adding executors

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Executors and cores in Apache Spark Sharing Zookeeper configuration on multiple Spark Executors Sharing data across 2 classes In Spark, is it possible to share data between two executors? Apache Spark take Action on Executors in fully distributed mode Sharing ApplicationContext with Executors Sharing data across Spring MVC Controller methods Sharing data across multiple instances of a java application Sharing JMS deadLetterChannel across direct route with Transaction in Apache Camel Spark OutOfMemoryError when adding executors

Related Tags

Sharing data across executors in Apache spark

Question

1 answers

solution1 1 ACCPTED 2018-12-18 06:28:40

solution1
1 ACCPTED 2018-12-18 06:28:40