I'm using Application Insights with a customEvent and need to get the number of events with a distinct field.
The event looks something like this:
{
"statusCode" : 200,
"some_field": "ABC123QWERTY"
}
I want the number of unique some_field
with statusCode
200. I've looked at this question and tried a couple of different queries. Some of them giving different answers. In SQL it would have looked something like this:
SELECT COUNT(DISTINCT my_field) AS Count
FROM customEvents
WHERE statusCode=200
Which one is correct?
customEvents
| where (customDimensions.statusCode == 200) and timestamp >= startofday(datetime(2022-12-01T00:00:00Z)) and timestamp <= endofday(datetime(2022-12-31T00:00:00Z))
| summarize dcount(tostring(customDimensions.some_field))
17,853 items
customEvents
| extend my_field = tostring(customDimensions.some_field)
| where customDimensions.statusCode == 200 and timestamp >= startofday(datetime(2022-12-01T00:00:00Z)) and timestamp <= endofday(datetime(2022-12-31T00:00:00Z))
| summarize Count = count() by my_field
17,774 items.
customEvents
| extend StatusCode = tostring(customDimensions["statusCode"]), MyField = tostring(customDimensions["some_field"])
| where timestamp >= startofday(datetime(2022-12-01T00:00:00Z)) and timestamp <= endofday(datetime(2022-12-31T00:00:00Z))
| summarize any(StatusCode) by MyField
| summarize Count = count() by any_StatusCode
17,626 items.
customEvents
| where (customDimensions.statusCode == 200) and timestamp >= startofday(datetime(2022-12-01T00:00:00Z)) and timestamp <= endofday(datetime(2022-12-31T00:00:00Z))
| summarize dcount(tostring(customDimensions.some_field),4)
17,736 items
customEvents
| where (customDimensions.statusCode == 200) and timestamp >= startofday(datetime(2022-12-01T00:00:00Z)) and timestamp <= endofday(datetime(2022-12-31T00:00:00Z))
| summarize count_distinct(tostring(customDimensions.some_field))
17,744 items
According to the learn.microsoft.com it states:
Use dcount and dcountif to count distinct values in a specific column.
And dcount-aggfunction mentions the accuracy:
Returns an estimate of the number of distinct values of expr in the group.
count_distinct seems to be the correct way:
Counts unique values specified by the scalar expression per summary group, or the total number of unique values if the summary group is omitted.
count_distinct()
is a new KQL function that returns an accurate result.
dcount()
returns an approximate result.
It can be used with a 2nd argument, a constant integer with value 0, 1, 2, 3 or 4 (0 = fast, 1 = default, 2 = accurate, 3 = extra accurate, 4 = super accurate).
Having said that -
I would guess that you executed your queries with a UI filter of last 24 hours or something similar.
This means that each execution ran over a different timespan.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.