My SPARK project (written in Java) requires to access (SELECT query results) different tables across executors.
One solution to this problem is :
DataFrame
to Map
. However, I have found that
Map
Map
of large size and passing it to executors as a broadcast variable doesn't sound efficient. Instead can we load tables in-memory using load
which can be shared across executors?
Is void org.apache.spark.sql.Dataset.createOrReplaceTempView(String viewName)
or void org.apache.spark.sql.Dataset.createGlobalTempView(String viewName) throws AnalysisException
Method useful for this purpose?
SPARK VERSION : 2.3.0
You can broadcast a DataFrame. See documentation
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.