[英]Spark requests for more core than asked when calling POST livy batch api in azure synapse
I have an azure synapse spark cluster with 3 nodes of 4 vCores and 32 GB memory each.我有一个 azure 突触火花集群,每个节点有 3 个 4 个 vCore 和 32 GB memory。 I am trying to submit a spark job using azure synapse Livy batch APIs.我正在尝试使用 azure synapse Livy 批处理 API 提交 spark 作业。 The request looks like this,请求看起来像这样,
curl --location --request POST 'https://<synapse-workspace>.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/<pool-name>/batches?detailed=true' `
--header 'cache-control: no-cache' `
--header 'Authorization: Bearer <Token>' `
--header 'Content-Type: application/json' `
--data-raw '{
"name": "T1",
"file": "folder/file.py",
"driverMemory": "1g",
"driverCores": 1,
"executorMemory": "1g",
"executorCores":1,
"numExecutors": 3
}'
The response I get is this,我得到的回应是,
{
"TraceId": "<some-guid>",
"Message": "Your Spark job requested 16 vcores. However, the pool has a 12 core limit. Try reducing the numbers of vcores requested or increasing your pool size."
}
I cannot figure out why is it asking for 16 cores.我不明白为什么它要求 16 个内核。 Shouldn't it ask for 4 (3 * 1 + 1) cores?它不应该要求 4 (3 * 1 + 1) 个内核吗?
Update : I tried changing the node pool size to 3 nodes each of 8 vCores and 64 GB memory. And, with this configuration,更新:我尝试将节点池大小更改为 3 个节点,每个节点 8 个 vCore 和 64 GB memory。并且,使用此配置,
{
"name": "T1",
"file": "folder/file.py",
"driverMemory": "1g",
"driverCores": 1,
"executorMemory": "1g",
"executorCores": 1,
"numExecutors": 6
}
It requests for 28 cores (even for executorCores 2,3,4).它请求 28 个内核(即使是 executorCores 2、3、4)。 And if I change executorCores to 5,6,7 or 8, it will request for 56 cores.如果我将 executorCores 更改为 5、6、7 或 8,它将请求 56 个内核。
From the portal there is no way to do what you are trying to do.从门户网站没有办法做你想做的事。
But you can still submit spark job by specifying driver (core and memory) and executor (core and memory).但是您仍然可以通过指定驱动程序(内核和内存)和执行程序(内核和内存)来提交 spark 作业。 For example with something like this: Submit Spark job in Azure Synapse from Java例如,这样的事情: 在 Azure 中提交 Spark 作业 来自 Java 的 Synapse
Using the above code, I am able to submit 9 concurrent jobs (with 1 driver and 1 executor, both consuming a single core) in 3 node Medium instances (8 cores each, though only 7 are available for use as 1 is reserved for hadoop daemon).使用上面的代码,我能够在 3 个节点中型实例(每个 8 个内核,但只有 7 个可用,因为 1 个保留给 hadoop)中提交 9 个并发作业(有 1 个驱动程序和 1 个执行程序,都消耗一个内核)守护进程)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.