[英]How to submit spark job from within java program to standalone spark cluster without using spark-submit?
[英]Submit Spark job in Azure Synapse from Java
Azure Synapse 提供托管火花池,可以在其中提交火花作业。
对于 (1):
添加以下依赖项:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-analytics-synapse-spark</artifactId>
<version>1.0.0-beta.4</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
</dependency>
使用以下示例代码:
import com.azure.analytics.synapse.spark.SparkBatchClient;
import com.azure.analytics.synapse.spark.SparkClientBuilder;
import com.azure.analytics.synapse.spark.models.SparkBatchJob;
import com.azure.analytics.synapse.spark.models.SparkBatchJobOptions;
import com.azure.identity.DefaultAzureCredentialBuilder;
import java.util.*;
public class SynapseService {
private final SparkBatchClient batchClient;
public SynapseService() {
batchClient = new SparkClientBuilder()
.endpoint("https://xxxx.dev.azuresynapse.net/")
.sparkPoolName("TestPool")
.credential(new DefaultAzureCredentialBuilder().build())
.buildSparkBatchClient();
}
public SparkBatchJob submitSparkJob(String name, String mainFile, String mainClass, List<String> arguments, List<String> jars) {
SparkBatchJobOptions options = new SparkBatchJobOptions()
.setName(name)
.setFile(mainFile)
.setClassName(mainClass)
.setArguments(arguments)
.setJars(jars)
.setExecutorCount(3)
.setExecutorCores(4)
.setDriverCores(4)
.setDriverMemory("6G")
.setExecutorMemory("6G");
return batchClient.createSparkBatchJob(options);
}
/**
* All possible Livy States: https://docs.microsoft.com/en-us/rest/api/synapse/data-plane/spark-batch/get-spark-batch-jobs#livystates
*
* Some of the values: busy, dead, error, idle, killed, not_Started, recovering, running, shutting_down, starting, success
* @param id
* @return
*/
public SparkBatchJob getSparkJob(int id, boolean detailed) {
return batchClient.getSparkBatchJob(id, detailed);
}
/**
* Cancels the ongoing synapse spark job
* @param jobId id of the synapse job
*/
public void cancelSparkJob(int jobId) {
batchClient.cancelSparkBatchJob(jobId);
}
}
最后提交 spark-job:
SynapseService synapse = new SynapseService();
synapse.submitSparkJob("TestJob",
"abfss://builds@xxxx.dfs.core.windows.net/core/jars/main-module_2.12-1.0.jar",
"com.xx.Main",
Collections.emptyList(),
Arrays.asList("abfss://builds@xxxx.dfs.core.windows.net/core/jars/*"));
最后,您需要在以下方面提供必要的角色:
Synapse Compute Operator
和Synapse Compute Operator
回答问题 2:
当作业通过 jars 在 synapse 中提交时,它们相当于spark-submit
。 所以所有的工作都是彼此不可知的,并且不共享彼此的依赖关系。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.