简体   繁体   中英

quyering a json file in Azure Synapse

I have a json file in Azure Storage account which I need to query using Synapse SQL serverless pool. On running the below query I get the first 10 result of my file. I have copied the sample output to understand the content and schema. I need to write a query such that i get those enteries where logs do not have system:serviceaccount:internal-services:spinnaker and system:serviceaccounts:internal-services and time should be between 2022-05-23T13:45:13.0000000Z\ and 2022-05-23T17:45:13.0000000Z\

Can someone help me in writing a Query here. The Query which I run and get first 10 result are here:

select top 10 *
from openrowset(
        bulk 'https://azdevogs.blob.core.windows.net/insights-logs-kube-audit/resourceId=/SUBSCRIPTIONS/533AEB/RESOURCEGROUPS/AZURE-TEST/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-TEST/y=2022/m=05/d=23/h=13/m=00/PT1H.json',
        format = 'csv',
        fieldterminator ='0x0b',
        fieldquote = '0x0b'
    ) with (doc nvarchar(max)) as rows
go

Result:

[{"data":[["{ \"operationName\": \"Microsoft.ContainerService/managedClusters/diagnosticLogs/Read\", \"category\": \"kube-audit\", \"ccpNamespace\": \"5f40f\", \"resourceId\": \"/SUBSCRIPTIONS/531C3AEB/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV\", \"properties\": {\"log\":\"{\\\"kind\\\":\\\"Event\\\",\\\"apiVersion\\\":\\\"audit.k8s.io/v1\\\",\\\"level\\\":\\\"Metadata\\\",\\\"auditID\\\":\\\"b7bca3\\\",\\\"stage\\\":\\\"ResponseComplete\\\",\\\"requestURI\\\":\\\"/apis/chaos-mesh.org/v1alpha1/namespaces/velero/httpchaos?limit=500\\\",\\\"verb\\\":\\\"list\\\",\\\"user\\\":{\\\"username\\\":\\\"system:serviceaccount:internal-services:spinnaker\\\",\\\"uid\\\":\\\"3feceb35e\\\",\\\"groups\\\":[\\\"system:serviceaccounts\\\",\\\"system:serviceaccounts:internal-services\\\",\\\"system:authenticated\\\"]},\\\"sourceIPs\\\":[\\\"35.205.140.108\\\"],\\\"userAgent\\\":\\\"kubectl/v1.18.10 (linux/amd64) kubernetes/62876fc\\\",\\\"objectRef\\\":{\\\"resource\\\":\\\"httpchaos\\\",\\\"namespace\\\":\\\"velero\\\",\\\"apiGroup\\\":\\\"chaos-mesh.org\\\",\\\"apiVersion\\\":\\\"v1alpha1\\\"},\\\"responseStatus\\\":{\\\"metadata\\\":{},\\\"code\\\":200},\\\"requestReceivedTimestamp\\\":\\\"2022-05-23T13:45:13.140759Z\\\",\\\"stageTimestamp\\\":\\\"2022-05-23T13:45:13.146101Z\\\",\\\"annotations\\\":{\\\"authentication.k8s.io/legacy-token\\\":\\\"system:serviceaccount:internal-services:spinnaker\\\",\\\"authorization.k8s.io/decision\\\":\\\"allow\\\",\\\"authorization.k8s.io/reason\\\":\\\"RBAC: allowed by ClusterRoleBinding \\\\\\\"spinnaker-cluster-admin\\\\\\\" of ClusterRole \\\\\\\"cluster-admin\\\\\\\" to ServiceAccount \\\\\\\"spinnaker/internal-services\\\\\\\"\\\"}}\\n\",\"stream\":\"stdout\",\"pod\":\"kube-apiserver-76d-q68\"}, \"time\": \"2022-05-23T13:45:13.0000000Z\", \"Cloud\": \"AzureCloud\", \"Environment\": \"prod\", \"UnderlayClass\": \"hcp-underlay\", \"UnderlayName\": \"hcp-underlay-westeurope-cx-624\"}"],["{ \"operationName\": \"Microsoft.ContainerService/managedClusters/diagnosticLogs/Read\", \"category\": \"kube-audit\", \"ccpNamespace\": \"5ff040f\", \"resourceId\": \"/SUBSCRIPTIONS/531B20C3AEB/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV\", \"properties\": {\"log\":\"{\\\"kind\\\":\\\"Event\\\",\\\"apiVersion\\\":\\\"audit.k8s.io/v1\\\",\\\"level\\\":\\\"Metadata\\\",\\\"auditID\\\":\\\"f2b766d\\\",\\\"stage\\\":\\\"ResponseComplete\\\",\\\"requestURI\\\":\\\"/apis/chaos-mesh.org/v1alpha1/namespaces/velero/iochaos?limit=500\\\",\\\"verb\\\":\\\"list\\\",\\\"user\\\":{\\\"username\\\":\\\"system:serviceaccount:internal-services:spinnaker\\\",\\\"uid\\\":\\\"3fec72feb35e\\\",\\\"groups\\\":[\\\"system:serviceaccounts\\\",\\\"system:serviceaccounts:internal-services\\\",\\\"system:authenticated\\\"]},\\\"sourceIPs\\\":[\\\"35.205.140.108\\\"],\\\"userAgent\\\":\\\"kubectl/v1.18.10 (linux/amd64) kubernetes/62876fc\\\",\\\"objectRef\\\":{\\\"resource\\\":\\\"iochaos\\\",\\\"namespace\\\":\\\"velero\\\",\\\"apiGroup\\\":\\\"chaos-mesh.org\\\",\\\"apiVersion\\\":\\\"v1alpha1\\\"},\\\"responseStatus\\\":{\\\"metadata\\\":{},\\\"code\\\":200},\\\"requestReceivedTimestamp\\\":\\\"2022-05-23T13:45:13.156899Z\\\",\\\"stageTimestamp\\\":\\\"2022-05-23T13:45:13.162219Z\\\",\\\"annotations\\\":{\\\"authentication.k8s.io/legacy-token\\\":\\\"system:serviceaccount:internal-services:spinnaker\\\",\\\"authorization.k8s.io/decision\\\":\\\"allow\\\",\\\"authorization.k8s.io/reason\\\":\\\"RBAC: allowed by ClusterRoleBinding \\\\\\\"spinnaker-cluster-admin\\\\\\\" of ClusterRole \\\\\\\"cluster-admin\\\\\\\" to ServiceAccount \\\\\\\"spinnaker/internal-services\\\\\\\"\\\"}}\\n\",\"stream\":\"stdout\",\"pod\":\"kube-apiserver-768d-q68\"}, \"time\": \"2022-05-23T13:45:13.0000000Z\", \"Cloud\": \"AzureCloud\", \"Environment\": \"prod\", \"UnderlayClass\": \"hcp-underlay\", \"UnderlayName\": \"hcp-underlay-westeurope-cx-624\"}"],,"schema":[{"columnName":"doc","ordinal":0,"dataTypeName":"nvarchar"}]]}]

You can use openjson function to parse your json array into a table. This way you can extract data from json array to relational format.

The json sample you provided is not a valid json. Make sure that the actual data you are querying is valid json, otherwise you get a JSON text is not properly formatted -error.

You data looks like it is coming from AKS diagnostics logs/audit logs, but the json format is not the original log format. Are you on purpose transforming this to another structure? For the original Azure diagnostics logs structure for AKS audit logs the following example SQL query would result into a relation with columns username and time, and you would be able to filter based on these:

SELECT logs.time, logitem.username
FROM OPENROWSET(
    BULK 'https://....core.windows.net/.../PT1H.json',
    FORMAT = 'CSV',
    FIELDQUOTE = '0x0b',
    FIELDTERMINATOR ='0x0b'
)
WITH (
    jsonContent varchar(MAX)
) AS [result] cross apply openjson (jsonContent, '$') 
   with ( 
           time nvarchar(max) '$.time',
           logjson nvarchar(max) '$.properties.log'
           )  logs cross apply openjson (logs.logjson, '$') 
   with ( 
           username nvarchar(max) '$.user.username'
           )  logitem 

For this query you can add simple time and username filter in where-clause like with normal sql.

You can find more info for the openjson syntax here: https://docs.microsoft.com/en-us/sql/t-sql/functions/openjson-transact-sql

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM