简体   繁体   English

Azure Synapse Notebook 代码检索火花池标签

[英]Azure Synapse Notebook code to retrieve spark pool tags

When running a Pyspark notebook interactively or in a pipeline, how to retrieve the Spark Pool tags?以交互方式或在管道中运行 Pyspark 笔记本时,如何检索 Spark Pool 标签? Please provide code example.请提供代码示例。 Thx谢谢

The answer to this question is not simple since Spark is open source code and Azure object tags is web services code.这个问题的答案并不简单,因为 Spark 是开源代码,而 Azure object 标签是 web 服务代码。

I will walk you thru my thought process and how I solved this problem.我将带您了解我的思考过程以及我是如何解决这个问题的。

First, the spark session contains the name of the cluster in which the notebook is running under in Synapse.首先,spark session 包含在 Synapse 中运行笔记本的集群的名称。 The following code retrieves this name.以下代码检索此名称。

%%pyspark

#
# Get spark pool name
#

# Import library
from pyspark.context import SparkContext

# Create context
sc = SparkContext.getOrCreate()

# Get configuration
tuples = sc.getConf().getAll()

# Find spark pool name
for element in tuples:
    if element[0].find('spark.synapse.pool.name') != -1:
        print (element[0])
        print (element[1])
        print ("")

Here is the output from the execution.这是执行中的 output。

在此处输入图像描述

The next task is to add a tag to the existing spark cluster.下一个任务是为现有的 spark 集群添加标签。 My tag is called "stack_overflow_question" and the answer is "yes".我的标签叫做“stack_overflow_question”,答案是“是”。 This is the key value pair.这是键值对。

在此处输入图像描述

Since the spark context does not contain this tag information, we have to move to Azure tools to get this information.由于 spark 上下文不包含此标记信息,因此我们必须使用 Azure 工具来获取此信息。

The Azure Command Line Interface is one step up from a REST API call. Azure 命令行接口比 REST API 调用更上一层楼。 I am going to do a quick test to make sure the list command returns the information that I want.我将做一个快速测试以确保 list 命令返回我想要的信息。

在此处输入图像描述

We can see that using a REST API call will work.我们可以看到,使用 REST API 调用将起作用。

1 - We need to create a service principle that has access to Microsoft Graph, user read privilege's. 1 - 我们需要创建一个可以访问 Microsoft Graph、用户读取权限的服务原则。 I am adding two MSDN links to accomplish this task.我正在添加两个 MSDN 链接来完成这项任务。

https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal

https://docs.microsoft.com/en-us/graph/migrate-azure-ad-graph-configure-permissions?tabs=powershell https://docs.microsoft.com/en-us/graph/migrate-azure-ad-graph-configure-permissions?tabs=powershell

2 - We need to write code to login into Azure using the service principle and return an access token (bearer certificate). 2 - 我们需要编写代码使用服务原理登录到 Azure 并返回访问令牌(承载证书)。

%%pyspark

#
# 2 - Get access token
#

# Import library
import adal

# Key information (parameters)
tenant_id = 'your tenant id'
client_id = 'your client id'
client_secret = 'your client secret'
subscription_id = 'your subscription id'

# Microsoft login url
authority_url = 'https://login.microsoftonline.com/' + tenant_id
context = adal.AuthenticationContext(authority_url)

# Ask for access token
token = context.acquire_token_with_client_credentials(
    resource = 'https://management.azure.com/',
    client_id = client_id,
    client_secret = client_secret
)

# Show token
print(token["accessToken"])

If everything works correctly, you should get a large string of characters back.如果一切正常,你应该得到一大串字符。 I am only showing a portion to show it worked.我只展示了一部分来证明它有效。

在此处输入图像描述

3 - The last step is to create an REST API call to return the information that we want. 3 - 最后一步是创建一个 REST API 调用以返回我们想要的信息。 The code below does just that.下面的代码就是这样做的。 I am including the MSDN reference to the API.我包括对 API 的 MSDN 参考。

https://docs.microsoft.com/en-us/rest/api/synapse/big-data-pools https://docs.microsoft.com/en-us/rest/api/synapse/big-data-pools

%%pyspark

#
# 3 - List pool properties
#

# libraries
import requests
import json

# azure object info
sub_id = "your subscription id"
rg_name = "rg4synapse"
ws_name = "wsn4synapse"
sp_name = "asp4synapse"

# management url
url = ""
url += "https://management.azure.com/subscriptions/{}/".format(sub_id)
url += "resourceGroups/{}/providers/Microsoft.Synapse/".format(rg_name)
url += "workspaces/{}/".format(ws_name)
url += "bigDataPools/{}".format(sp_name)

# access token + api version
headers = {'Authorization': 'Bearer ' + token['accessToken'], 'Content-Type': 'application/json'}
params = {'api-version': '2021-06-01'}

# make rest api call
r = requests.get(url, headers=headers, params=params)

# show the results
print(json.dumps(r.json(), indent=4, separators=(',', ': ')))

I chose to place the resulting JSON document in this post as code.我选择将生成的 JSON 文档作为代码放在这篇文章中。 It is a-lot easier to see the whole string.查看整个字符串要容易得多。

{
    "properties": {
        "creationDate": "2021-09-13T19:46:27.95Z",
        "sparkVersion": "2.4",
        "nodeCount": 3,
        "nodeSize": "Small",
        "nodeSizeFamily": "MemoryOptimized",
        "autoScale": {
            "enabled": false,
            "minNodeCount": 3,
            "maxNodeCount": 3
        },
        "autoPause": {
            "enabled": true,
            "delayInMinutes": 15
        },
        "isComputeIsolationEnabled": false,
        "sessionLevelPackagesEnabled": true,
        "cacheSize": 0,
        "dynamicExecutorAllocation": {
            "enabled": false
        },
        "lastSucceededTimestamp": "2022-09-04T18:35:54.55Z",
        "isAutotuneEnabled": false,
        "provisioningState": "Succeeded"
    },
    "id": "/subscriptions/792f5db5-2798-4365-ba7b-e5812052a8d0/resourceGroups/rg4synapse/providers/Microsoft.Synapse/workspaces/wsn4synapse/bigDataPools/asp4synapse",
    "name": "asp4synapse",
    "type": "Microsoft.Synapse/workspaces/bigDataPools",
    "location": "eastus2",
    "tags": {
        "spark_overflow_question": "yes"
    }
}

Lets review the steps to make this happen.让我们回顾一下实现这一点的步骤。

1 - Use the spark session to identify which cluster is being used by the notebook. 1 - 使用 spark session 来识别笔记本正在使用哪个集群。

2 - Have a service principle defined with access to read Microsoft Graph. 2 - 定义了一个服务原则,可以读取 Microsoft Graph。

3 - Login to Azure using the service principle to grab an access token. 3 - 使用服务原理登录到 Azure 以获取访问令牌。

4 - Make the rest API call with the access token and cluster name to return tag properties. 4 - 使用访问令牌和集群名称调用 rest API 以返回标签属性。

In short, this solves your problem.简而言之,这可以解决您的问题。

I leave parsing the JSON document with you.我把 JSON 文档的解析留给你。 Just a hint, look at this link.只是一个提示,看看这个链接。

https://www.geeksforgeeks.org/json-loads-in-python/ https://www.geeksforgeeks.org/json-loads-in-python/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 Azure Synapse Analytics Spark Pool 连接到 Azure SQL 数据库 - Connecting from Azure Synapse Analytics Spark Pool to Azure SQL Database Spark 池需要时间在 azure synapse Analytics 中启动 - Spark pool taking time to start in azure synapse Analytics 在 azure 突触笔记本中使用 spark.sql 提取 json 列 - extract json column using spark.sql in azure synapse notebook ODBC 连接到 Synapse 专用 SQL 池数据库通过带有 pyodbc 的 spark Notebook 时出错 - Error in ODBC Connection to Synapse Dedicated SQL Pool DB via spark Notebook with pyodbc 如何使用 Azure Synapse 和 pySpark 笔记本从 ADLS gen2 检索 .dcm 图像文件? - How to retrieve .dcm image files from the ADLS gen2 using Azure Synapse and pySpark notebook? 转换和读取 Azure Synapse notebook 中的 json 文件 - Transforming and Reading json files in Azure Synapse notebook Package Conda 环境与 Azure Synapse Spark 作业 - Package Conda Environment with Azure Synapse Spark Jobs 使用 apache-sedona 在 Synapse Spark Pool 中使用 SparkSQL/Python 的空间? - Spatial with SparkSQL/Python in Synapse Spark Pool using apache-sedona? Synapse Spark 池错误 HTTP Request failed to Authenticate - Synapse Spark Pool Error HTTP Request failed to Authenticate 如何引发异常以退出 Synapse Apache Spark 笔记本 - How to raise an exception to exit Synapse Apache Spark notebook
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM