转换和读取 Azure Synapse notebook 中的 json 文件

Question

I have below Json in one of my storage account and I am able to read it by following the below code.我的一个存储帐户中有以下 Json ，我可以按照以下代码阅读它。 I need help in reading the columns where "pod" has value "kube-apiserver-78" or "kube-apiserver-79" and username has "system:serviceaccount:xyz" or "system:serviceaccount:poq": can someone help me how can I translate it below code.我需要帮助来阅读“pod”的值为“kube-apiserver-78”或“kube-apiserver-79”且用户名的值为“system:serviceaccount:xyz”或“system:serviceaccount:poq”的列：有人可以帮忙吗我如何在代码下面翻译它。

df = spark.read.json('abfss://insights-logs-kube-audit@azogs.dfs.core.windows.net/resourceId=/SUBSCRIPTIONS/5IS/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV/y=2022/m=08/d=09/h=11/m=00/')

df.show()

Sample Json file in Storage container Which I read:我读到的存储容器中的示例 Json 文件：

{ "operationName": "Microsoft.ContainerService/managedClusters/diagnosticLogs/Read", "category": "kube-audit", "ccpNamespace": "5f", "resourceId": "/SUBSCRIPTIONS/SID/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV", "properties": {"log":"{\"kind\":\"Event\",\"apiVersion\":\"audit.k8s.io/v1\",\"level\":\"Metadata\",\"auditID\":\"b7b1ca3\",\"stage\":\"ResponseComplete\",\"requestURI\":\"/apis/chaos-mesh.org/v1alpha1/namespaces/ve/httpchaos?limit=500\",\"verb\":\"list\",\"user\":{\"username\":\"system:serviceaccount:xyz\",\"uid\":\"3eb35e\",\"groups\":[\"system:serviceaccounts\",\"system:serviceaccounts:internal-services\",\"system:authenticated\"]},\"sourceIPs\":[\"100.100.100.100\"],\"userAgent\":\"ktl/v1.18.10 (linux/amd64) kubernetes/62c\",\"objectRef\":{\"resource\":\"httpchaos\",\"namespace\":\"vo\",\"apiGroup\":\"chaos-mesh.org\",\"apiVersion\":\"v1alpha1\"},\"responseStatus\":{\"metadata\":{},\"code\":200},\"requestReceivedTimestamp\":\"2022-05-23T13:45:13.140759Z\",\"stageTimestamp\":\"2022-05-23T13:45:13.146101Z\",\"annotations\":{\"authentication.k8s.io/legacy-token\":\"system:serviceaccount:ixyzr\",\"authorization.k8s.io/decision\":\"allow\",\"authorization.k8s.io/reason\":\"RBAC: allowed by ClusterRoleBinding \\\"admin\\\" of ClusterRole \\\"cluster-admin\\\" to ServiceAccount \\\"abc/xyz\\\"\"}}\n","stream":"stdout","pod":"kube-apiserver-78"}, "time": "2022-05-23T13:45:13.0000000Z", "Cloud": "AzureCloud", "Environment": "prod", "UnderlayClass": "hcp-underlay", "UnderlayName": "h-24"}
{ "operationName": "Microsoft.ContainerService/managedClusters/diagnosticLogs/Read", "category": "kube-audit", "ccpNamespace": "5f", "resourceId": "/SUBSCRIPTIONS/SID/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV", "properties": {"log":"{\"kind\":\"Event\",\"apiVersion\":\"audit.k8s.io/v1\",\"level\":\"Metadata\",\"auditID\":\"b7b1cax3\",\"stage\":\"ResponseComplete\",\"requestURI\":\"/apis/chaos-mesh.org/v1alpha1/namespaces/ve/httpchaos?limit=500\",\"verb\":\"list\",\"user\":{\"username\":\"system:serviceaccount:xyz\",\"uid\":\"3eb35e\",\"groups\":[\"system:serviceaccounts\",\"system:serviceaccounts:internal-services\",\"system:authenticated\"]},\"sourceIPs\":[\"100.100.100.100\"],\"userAgent\":\"ktl/v1.18.10 (linux/amd64) kubernetes/62c\",\"objectRef\":{\"resource\":\"httpchaos\",\"namespace\":\"vo\",\"apiGroup\":\"chaos-mesh.org\",\"apiVersion\":\"v1alpha1\"},\"responseStatus\":{\"metadata\":{},\"code\":200},\"requestReceivedTimestamp\":\"2022-05-23T13:45:13.140759Z\",\"stageTimestamp\":\"2022-05-23T13:45:13.146101Z\",\"annotations\":{\"authentication.k8s.io/legacy-token\":\"system:serviceaccount:ixyzr\",\"authorization.k8s.io/decision\":\"allow\",\"authorization.k8s.io/reason\":\"RBAC: allowed by ClusterRoleBinding \\\"admin\\\" of ClusterRole \\\"cluster-admin\\\" to ServiceAccount \\\"abc/xyz\\\"\"}}\n","stream":"stdout","pod":"kube-apiserver-78"}, "time": "2022-05-23T13:45:13.0000000Z", "Cloud": "AzureCloud", "Environment": "prod", "UnderlayClass": "hcp-underlay", "UnderlayName": "h-24"}

Answer 1

To query Json file After reading it convert it into temporal tables in Apache Spark and query them using Spark SQL.查询Json文件读取后将其转换为 Apache Spark 中的时态表，并使用 Spark SQL 查询它们。

To convert it into temporal table, use command:要将其转换为临时表，请使用命令：

df.createOrReplaceTempView("Name for temporal table")

Then query on this temporal table using Spark SQL.然后使用 Spark SQL 查询这个时态表。

SELECT * FROM "Name for temporal table"
WHERE (pod = 'kube-apiserver-78' or pod = 'kube-apiserver-79') 
and (username = 'system:serviceaccount:xyz' or username = 'system:serviceaccount:poq')

Reference: Query JSON Files with Azure Synapse Analytics Notebooks参考：使用 Azure Synapse Analytics Notebooks 查询 JSON 文件

转换和读取 Azure Synapse notebook 中的 json 文件

问题描述

1 个解决方案

解决方案1
0 2022-08-30 12:30:15

转换和读取 Azure Synapse notebook 中的 json 文件

问题描述

1 个解决方案

解决方案1 0 2022-08-30 12:30:15

解决方案1
0 2022-08-30 12:30:15