Azure Synapse Notebook 代码检索火花池标签

Question

When running a Pyspark notebook interactively or in a pipeline, how to retrieve the Spark Pool tags?以交互方式或在管道中运行 Pyspark 笔记本时，如何检索 Spark Pool 标签？ Please provide code example.请提供代码示例。 Thx谢谢

Answer 1

The answer to this question is not simple since Spark is open source code and Azure object tags is web services code.这个问题的答案并不简单，因为 Spark 是开源代码，而 Azure object 标签是 web 服务代码。

I will walk you thru my thought process and how I solved this problem.我将带您了解我的思考过程以及我是如何解决这个问题的。

First, the spark session contains the name of the cluster in which the notebook is running under in Synapse.首先，spark session 包含在 Synapse 中运行笔记本的集群的名称。 The following code retrieves this name.以下代码检索此名称。

%%pyspark

#
# Get spark pool name
#

# Import library
from pyspark.context import SparkContext

# Create context
sc = SparkContext.getOrCreate()

# Get configuration
tuples = sc.getConf().getAll()

# Find spark pool name
for element in tuples:
    if element[0].find('spark.synapse.pool.name') != -1:
        print (element[0])
        print (element[1])
        print ("")

Here is the output from the execution.这是执行中的 output。

The next task is to add a tag to the existing spark cluster.下一个任务是为现有的 spark 集群添加标签。 My tag is called "stack_overflow_question" and the answer is "yes".我的标签叫做“stack_overflow_question”，答案是“是”。 This is the key value pair.这是键值对。

Since the spark context does not contain this tag information, we have to move to Azure tools to get this information.由于 spark 上下文不包含此标记信息，因此我们必须使用 Azure 工具来获取此信息。

The Azure Command Line Interface is one step up from a REST API call. Azure 命令行接口比 REST API 调用更上一层楼。 I am going to do a quick test to make sure the list command returns the information that I want.我将做一个快速测试以确保 list 命令返回我想要的信息。

We can see that using a REST API call will work.我们可以看到，使用 REST API 调用将起作用。

1 - We need to create a service principle that has access to Microsoft Graph, user read privilege's. 1 - 我们需要创建一个可以访问 Microsoft Graph、用户读取权限的服务原则。 I am adding two MSDN links to accomplish this task.我正在添加两个 MSDN 链接来完成这项任务。

https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal

https://docs.microsoft.com/en-us/graph/migrate-azure-ad-graph-configure-permissions?tabs=powershell https://docs.microsoft.com/en-us/graph/migrate-azure-ad-graph-configure-permissions?tabs=powershell

2 - We need to write code to login into Azure using the service principle and return an access token (bearer certificate). 2 - 我们需要编写代码使用服务原理登录到 Azure 并返回访问令牌（承载证书）。

%%pyspark

#
# 2 - Get access token
#

# Import library
import adal

# Key information (parameters)
tenant_id = 'your tenant id'
client_id = 'your client id'
client_secret = 'your client secret'
subscription_id = 'your subscription id'

# Microsoft login url
authority_url = 'https://login.microsoftonline.com/' + tenant_id
context = adal.AuthenticationContext(authority_url)

# Ask for access token
token = context.acquire_token_with_client_credentials(
    resource = 'https://management.azure.com/',
    client_id = client_id,
    client_secret = client_secret
)

# Show token
print(token["accessToken"])

If everything works correctly, you should get a large string of characters back.如果一切正常，你应该得到一大串字符。 I am only showing a portion to show it worked.我只展示了一部分来证明它有效。

3 - The last step is to create an REST API call to return the information that we want. 3 - 最后一步是创建一个 REST API 调用以返回我们想要的信息。 The code below does just that.下面的代码就是这样做的。 I am including the MSDN reference to the API.我包括对 API 的 MSDN 参考。

https://docs.microsoft.com/en-us/rest/api/synapse/big-data-pools https://docs.microsoft.com/en-us/rest/api/synapse/big-data-pools

%%pyspark

#
# 3 - List pool properties
#

# libraries
import requests
import json

# azure object info
sub_id = "your subscription id"
rg_name = "rg4synapse"
ws_name = "wsn4synapse"
sp_name = "asp4synapse"

# management url
url = ""
url += "https://management.azure.com/subscriptions/{}/".format(sub_id)
url += "resourceGroups/{}/providers/Microsoft.Synapse/".format(rg_name)
url += "workspaces/{}/".format(ws_name)
url += "bigDataPools/{}".format(sp_name)

# access token + api version
headers = {'Authorization': 'Bearer ' + token['accessToken'], 'Content-Type': 'application/json'}
params = {'api-version': '2021-06-01'}

# make rest api call
r = requests.get(url, headers=headers, params=params)

# show the results
print(json.dumps(r.json(), indent=4, separators=(',', ': ')))

I chose to place the resulting JSON document in this post as code.我选择将生成的 JSON 文档作为代码放在这篇文章中。 It is a-lot easier to see the whole string.查看整个字符串要容易得多。

{
    "properties": {
        "creationDate": "2021-09-13T19:46:27.95Z",
        "sparkVersion": "2.4",
        "nodeCount": 3,
        "nodeSize": "Small",
        "nodeSizeFamily": "MemoryOptimized",
        "autoScale": {
            "enabled": false,
            "minNodeCount": 3,
            "maxNodeCount": 3
        },
        "autoPause": {
            "enabled": true,
            "delayInMinutes": 15
        },
        "isComputeIsolationEnabled": false,
        "sessionLevelPackagesEnabled": true,
        "cacheSize": 0,
        "dynamicExecutorAllocation": {
            "enabled": false
        },
        "lastSucceededTimestamp": "2022-09-04T18:35:54.55Z",
        "isAutotuneEnabled": false,
        "provisioningState": "Succeeded"
    },
    "id": "/subscriptions/792f5db5-2798-4365-ba7b-e5812052a8d0/resourceGroups/rg4synapse/providers/Microsoft.Synapse/workspaces/wsn4synapse/bigDataPools/asp4synapse",
    "name": "asp4synapse",
    "type": "Microsoft.Synapse/workspaces/bigDataPools",
    "location": "eastus2",
    "tags": {
        "spark_overflow_question": "yes"
    }
}

Lets review the steps to make this happen.让我们回顾一下实现这一点的步骤。

1 - Use the spark session to identify which cluster is being used by the notebook. 1 - 使用 spark session 来识别笔记本正在使用哪个集群。

2 - Have a service principle defined with access to read Microsoft Graph. 2 - 定义了一个服务原则，可以读取 Microsoft Graph。

3 - Login to Azure using the service principle to grab an access token. 3 - 使用服务原理登录到 Azure 以获取访问令牌。

4 - Make the rest API call with the access token and cluster name to return tag properties. 4 - 使用访问令牌和集群名称调用 rest API 以返回标签属性。

In short, this solves your problem.简而言之，这可以解决您的问题。

I leave parsing the JSON document with you.我把 JSON 文档的解析留给你。 Just a hint, look at this link.只是一个提示，看看这个链接。

https://www.geeksforgeeks.org/json-loads-in-python/ https://www.geeksforgeeks.org/json-loads-in-python/

Azure Synapse Notebook 代码检索火花池标签

问题描述

1 个解决方案

解决方案1
0 2022-09-08 14:52:50

Azure Synapse Notebook 代码检索火花池标签

问题描述

1 个解决方案

解决方案1 0 2022-09-08 14:52:50

解决方案1
0 2022-09-08 14:52:50