简体   繁体   English

从 Java 提交 Azure 突触中的 Spark 作业

[英]Submit Spark job in Azure Synapse from Java

Azure Synapse provides managed spark pool, where the spark jobs can be submitted. Azure Synapse 提供托管火花池,可以在其中提交火花作业。

  1. How do submit spark-job (as jars) along with dependencies to the pool2 using Java如何使用 Java 将 spark-job(作为 jars)以及依赖项提交到 pool2
  2. If multiple jobs are submitted (each along with its own set of dependencies), then are the dependencies shared across the jobs.如果提交了多个作业(每个作业都有自己的一组依赖项),那么依赖项会在作业之间共享。 Or are they agnostic of each other?还是他们彼此不可知?

For (1):对于 (1):

Add the following dependency:添加以下依赖项:

    <dependency>
        <groupId>com.azure</groupId>
        <artifactId>azure-analytics-synapse-spark</artifactId>
        <version>1.0.0-beta.4</version>
    </dependency>
    <dependency>
        <groupId>com.azure</groupId>
        <artifactId>azure-identity</artifactId>
    </dependency>

With below sample code:使用以下示例代码:

import com.azure.analytics.synapse.spark.SparkBatchClient;
import com.azure.analytics.synapse.spark.SparkClientBuilder;
import com.azure.analytics.synapse.spark.models.SparkBatchJob;
import com.azure.analytics.synapse.spark.models.SparkBatchJobOptions;
import com.azure.identity.DefaultAzureCredentialBuilder;

import java.util.*;

public class SynapseService {
    private final SparkBatchClient batchClient;

    public SynapseService() {
        batchClient = new SparkClientBuilder()
                .endpoint("https://xxxx.dev.azuresynapse.net/")
                .sparkPoolName("TestPool")
                .credential(new DefaultAzureCredentialBuilder().build())
                .buildSparkBatchClient();
    }

    public SparkBatchJob submitSparkJob(String name, String mainFile, String mainClass, List<String> arguments, List<String> jars) {
        SparkBatchJobOptions options = new SparkBatchJobOptions()
                .setName(name)
                .setFile(mainFile)
                .setClassName(mainClass)
                .setArguments(arguments)
                .setJars(jars)
                .setExecutorCount(3)
                .setExecutorCores(4)
                .setDriverCores(4)
                .setDriverMemory("6G")
                .setExecutorMemory("6G");
        return batchClient.createSparkBatchJob(options);
    }

    /**
     * All possible Livy States: https://docs.microsoft.com/en-us/rest/api/synapse/data-plane/spark-batch/get-spark-batch-jobs#livystates
     *
     * Some of the values: busy, dead, error, idle, killed, not_Started, recovering, running, shutting_down, starting, success
     * @param id
     * @return
     */
    public SparkBatchJob getSparkJob(int id, boolean detailed) {
        return batchClient.getSparkBatchJob(id, detailed);
    }


    /**
     * Cancels the ongoing synapse spark job
     * @param jobId id of the synapse job
     */
    public void cancelSparkJob(int jobId) {
        batchClient.cancelSparkBatchJob(jobId);
    }

}

And finally submit the spark-job:最后提交 spark-job:

SynapseService synapse = new SynapseService();
synapse.submitSparkJob("TestJob",
        "abfss://builds@xxxx.dfs.core.windows.net/core/jars/main-module_2.12-1.0.jar",
        "com.xx.Main",
        Collections.emptyList(),
        Arrays.asList("abfss://builds@xxxx.dfs.core.windows.net/core/jars/*"));

Finally, you will need to provide the necessary role in:最后,您需要在以下方面提供必要的角色:

  1. Open Synapse Analytics Studio打开 Synapse 分析工作室
  2. Manage -> Access Control管理 -> 访问控制
  3. Provide the role Synapse Compute Operator and Synapse Compute Operator to the caller向调用者提供角色Synapse Compute OperatorSynapse Compute Operator

To answer question-2:回答问题 2:

When jobs are submitted in synapse via jars, they are equivalent to spark-submit .当作业通过 jars 在 synapse 中提交时,它们相当于spark-submit So all the jobs are agnostic of each other and do not share each other's dependencies.所以所有的工作都是彼此不可知的,并且不共享彼此的依赖关系。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在不使用spark-submit的情况下将java程序中的spark作业提交到独立的spark集群? - How to submit spark job from within java program to standalone spark cluster without using spark-submit? 使用 Azure 突触与 Java - Using Azure Synapse with Java Spark提交作业在集群模式下失败,但在Java中可用于本地HDFS中的copyToLocal - Spark submit job fails for cluster mode but works in local for copyToLocal from HDFS in java 启动并提交工作火花 - launch and submit job spark 从代码提交Spark程序,并且该作业未显示在历史记录服务器ui上 - submit spark program from code,and that job not shown on history server ui 如何远程提交Spark作业 - How to submit spark job Remotely 从Java以编程方式执行Spark-Submit - Execute spark-submit programmatically from java Unable to submit a spark job on spark cluster on docker - Unable to submit a spark job on spark cluster on docker 使用spark-cassandra-connector-java API并尝试提交Spark作业时出错 - Using spark-cassandra-connector-java api and get error when attempting to submit a spark job Spark 提交无法运行 Java Spark 作业访问 AWS S3 [NoSuch 方法:ProviderUtils.excludeIncompatibleCredentialProviders] - Spark Submit Failed to run a Java Spark Job Accessing AWS S3 [NoSuch Method: ProviderUtils.excludeIncompatibleCredentialProviders]
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM