简体   繁体   English

在 azure 突触中调用 POST livy 批处理 api 时,Spark 请求的核心数比要求的多

[英]Spark requests for more core than asked when calling POST livy batch api in azure synapse

I have an azure synapse spark cluster with 3 nodes of 4 vCores and 32 GB memory each.我有一个 azure 突触火花集群,每个节点有 3 个 4 个 vCore 和 32 GB memory。 I am trying to submit a spark job using azure synapse Livy batch APIs.我正在尝试使用 azure synapse Livy 批处理 API 提交 spark 作业。 The request looks like this,请求看起来像这样,

curl --location --request POST 'https://<synapse-workspace>.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/<pool-name>/batches?detailed=true' `
--header 'cache-control: no-cache' `
--header 'Authorization: Bearer <Token>' `
--header 'Content-Type: application/json' `
--data-raw '{
    "name": "T1",
    "file": "folder/file.py",
    "driverMemory": "1g",
    "driverCores": 1,
    "executorMemory": "1g",
    "executorCores":1,
    "numExecutors": 3
}'

The response I get is this,我得到的回应是,

{
    "TraceId": "<some-guid>",
    "Message": "Your Spark job requested 16 vcores. However, the pool has a 12 core limit. Try reducing the numbers of vcores requested or increasing your pool size."
}

I cannot figure out why is it asking for 16 cores.我不明白为什么它要求 16 个内核。 Shouldn't it ask for 4 (3 * 1 + 1) cores?它不应该要求 4 (3 * 1 + 1) 个内核吗?

Update : I tried changing the node pool size to 3 nodes each of 8 vCores and 64 GB memory. And, with this configuration,更新:我尝试将节点池大小更改为 3 个节点,每个节点 8 个 vCore 和 64 GB memory。并且,使用此配置,

{
    "name": "T1",
    "file": "folder/file.py",
    "driverMemory": "1g",
    "driverCores": 1,
    "executorMemory": "1g",
    "executorCores": 1,
    "numExecutors": 6
}

It requests for 28 cores (even for executorCores 2,3,4).它请求 28 个内核(即使是 executorCores 2、3、4)。 And if I change executorCores to 5,6,7 or 8, it will request for 56 cores.如果我将 executorCores 更改为 5、6、7 或 8,它将请求 56 个内核。

From the portal there is no way to do what you are trying to do.从门户网站没有办法做你想做的事。

But you can still submit spark job by specifying driver (core and memory) and executor (core and memory).但是您仍然可以通过指定驱动程序(内核和内存)和执行程序(内核和内存)来提交 spark 作业。 For example with something like this: Submit Spark job in Azure Synapse from Java例如,这样的事情: 在 Azure 中提交 Spark 作业 来自 Java 的 Synapse

Using the above code, I am able to submit 9 concurrent jobs (with 1 driver and 1 executor, both consuming a single core) in 3 node Medium instances (8 cores each, though only 7 are available for use as 1 is reserved for hadoop daemon).使用上面的代码,我能够在 3 个节点中型实例(每个 8 个内核,但只有 7 个可用,因为 1 个保留给 hadoop)中提交 9 个并发作业(有 1 个驱动程序和 1 个执行程序,都消耗一个内核)守护进程)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 spark dataframe 从 Databricks 写入 Azure Synapse 时出错 - Error when write spark dataframe from Databricks into Azure Synapse 如果 Azure 将包含超过 256KB 的消息批次发送到服务总线,会发生什么情况? - What will happen, when Azure sends the batch of message to service bus if contains more than 256KB? 写入 Synapse DWH 池时出现 Spark 错误 - Spark errors when writing to Synapse DWH pool Azure Synapse web 调用“pipelineruns”的活动 rest api,失败并显示“failureType”:“UserError” - Azure Synapse web activity calling "pipelineruns" rest api , fails with "failureType": "UserError" 无法安装 Python Wheel Package 到 Azure Synapse Apache Spark Pool - Unable to Install Python Wheel Package to Azure Synapse Apache Spark Pool spark.write.synapsesql 选项与 Azure Synapse Spark Pool - spark.write.synapsesql options with Azure Synapse Spark Pool Azure Synapse:在Spark作业参考文件中上传py文件的目录 - Azure Synapse: Upload directory of py files in Spark job reference files azure 突触中 spark notebook 管道中的文件路径错误 - File path error in pipeline for spark notebook in azure synapse 在 azure 突触笔记本中使用 spark.sql 提取 json 列 - extract json column using spark.sql in azure synapse notebook 如何首先使用 EF Core 代码与 azure 突触 - How use EF Core code first with azure synapse
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM