[英]Transforming and Reading json files in Azure Synapse notebook
I have below Json in one of my storage account and I am able to read it by following the below code.我的一个存储帐户中有以下 Json ,我可以按照以下代码阅读它。 I need help in reading the columns where "pod" has value "kube-apiserver-78" or "kube-apiserver-79" and username has "system:serviceaccount:xyz" or "system:serviceaccount:poq": can someone help me how can I translate it below code.
我需要帮助来阅读“pod”的值为“kube-apiserver-78”或“kube-apiserver-79”且用户名的值为“system:serviceaccount:xyz”或“system:serviceaccount:poq”的列:有人可以帮忙吗我如何在代码下面翻译它。
df = spark.read.json('abfss://insights-logs-kube-audit@azogs.dfs.core.windows.net/resourceId=/SUBSCRIPTIONS/5IS/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV/y=2022/m=08/d=09/h=11/m=00/')
df.show()
Sample Json file in Storage container Which I read:我读到的存储容器中的示例 Json 文件:
{ "operationName": "Microsoft.ContainerService/managedClusters/diagnosticLogs/Read", "category": "kube-audit", "ccpNamespace": "5f", "resourceId": "/SUBSCRIPTIONS/SID/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV", "properties": {"log":"{\"kind\":\"Event\",\"apiVersion\":\"audit.k8s.io/v1\",\"level\":\"Metadata\",\"auditID\":\"b7b1ca3\",\"stage\":\"ResponseComplete\",\"requestURI\":\"/apis/chaos-mesh.org/v1alpha1/namespaces/ve/httpchaos?limit=500\",\"verb\":\"list\",\"user\":{\"username\":\"system:serviceaccount:xyz\",\"uid\":\"3eb35e\",\"groups\":[\"system:serviceaccounts\",\"system:serviceaccounts:internal-services\",\"system:authenticated\"]},\"sourceIPs\":[\"100.100.100.100\"],\"userAgent\":\"ktl/v1.18.10 (linux/amd64) kubernetes/62c\",\"objectRef\":{\"resource\":\"httpchaos\",\"namespace\":\"vo\",\"apiGroup\":\"chaos-mesh.org\",\"apiVersion\":\"v1alpha1\"},\"responseStatus\":{\"metadata\":{},\"code\":200},\"requestReceivedTimestamp\":\"2022-05-23T13:45:13.140759Z\",\"stageTimestamp\":\"2022-05-23T13:45:13.146101Z\",\"annotations\":{\"authentication.k8s.io/legacy-token\":\"system:serviceaccount:ixyzr\",\"authorization.k8s.io/decision\":\"allow\",\"authorization.k8s.io/reason\":\"RBAC: allowed by ClusterRoleBinding \\\"admin\\\" of ClusterRole \\\"cluster-admin\\\" to ServiceAccount \\\"abc/xyz\\\"\"}}\n","stream":"stdout","pod":"kube-apiserver-78"}, "time": "2022-05-23T13:45:13.0000000Z", "Cloud": "AzureCloud", "Environment": "prod", "UnderlayClass": "hcp-underlay", "UnderlayName": "h-24"}
{ "operationName": "Microsoft.ContainerService/managedClusters/diagnosticLogs/Read", "category": "kube-audit", "ccpNamespace": "5f", "resourceId": "/SUBSCRIPTIONS/SID/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV", "properties": {"log":"{\"kind\":\"Event\",\"apiVersion\":\"audit.k8s.io/v1\",\"level\":\"Metadata\",\"auditID\":\"b7b1cax3\",\"stage\":\"ResponseComplete\",\"requestURI\":\"/apis/chaos-mesh.org/v1alpha1/namespaces/ve/httpchaos?limit=500\",\"verb\":\"list\",\"user\":{\"username\":\"system:serviceaccount:xyz\",\"uid\":\"3eb35e\",\"groups\":[\"system:serviceaccounts\",\"system:serviceaccounts:internal-services\",\"system:authenticated\"]},\"sourceIPs\":[\"100.100.100.100\"],\"userAgent\":\"ktl/v1.18.10 (linux/amd64) kubernetes/62c\",\"objectRef\":{\"resource\":\"httpchaos\",\"namespace\":\"vo\",\"apiGroup\":\"chaos-mesh.org\",\"apiVersion\":\"v1alpha1\"},\"responseStatus\":{\"metadata\":{},\"code\":200},\"requestReceivedTimestamp\":\"2022-05-23T13:45:13.140759Z\",\"stageTimestamp\":\"2022-05-23T13:45:13.146101Z\",\"annotations\":{\"authentication.k8s.io/legacy-token\":\"system:serviceaccount:ixyzr\",\"authorization.k8s.io/decision\":\"allow\",\"authorization.k8s.io/reason\":\"RBAC: allowed by ClusterRoleBinding \\\"admin\\\" of ClusterRole \\\"cluster-admin\\\" to ServiceAccount \\\"abc/xyz\\\"\"}}\n","stream":"stdout","pod":"kube-apiserver-78"}, "time": "2022-05-23T13:45:13.0000000Z", "Cloud": "AzureCloud", "Environment": "prod", "UnderlayClass": "hcp-underlay", "UnderlayName": "h-24"}
To query Json
file After reading it convert it into temporal tables in Apache Spark and query them using Spark SQL.查询
Json
文件读取后将其转换为 Apache Spark 中的时态表,并使用 Spark SQL 查询它们。
To convert it into temporal table, use command:要将其转换为临时表,请使用命令:
df.createOrReplaceTempView("Name for temporal table")
Then query on this temporal table using Spark SQL.然后使用 Spark SQL 查询这个时态表。
SELECT * FROM "Name for temporal table"
WHERE (pod = 'kube-apiserver-78' or pod = 'kube-apiserver-79')
and (username = 'system:serviceaccount:xyz' or username = 'system:serviceaccount:poq')
Reference: Query JSON Files with Azure Synapse Analytics Notebooks参考: 使用 Azure Synapse Analytics Notebooks 查询 JSON 文件
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.