Kusto / Azure Data Explorer - Distinct count in kusto queries

Question

I'm using Application Insights with a customEvent and need to get the number of events with a distinct field.

The event looks something like this:

{
    "statusCode" : 200,
    "some_field": "ABC123QWERTY"
}

I want the number of unique some_field with statusCode 200. I've looked at this question and tried a couple of different queries. Some of them giving different answers. In SQL it would have looked something like this:

SELECT COUNT(DISTINCT my_field) AS Count
FROM customEvents
WHERE statusCode=200

Which one is correct?

1 - dcount with default accuracy

customEvents
| where (customDimensions.statusCode == 200) and timestamp >= startofday(datetime(2022-12-01T00:00:00Z)) and timestamp <= endofday(datetime(2022-12-31T00:00:00Z))
| summarize dcount(tostring(customDimensions.some_field))

17,853 items

2 - Count by my_field and count number of rows

customEvents
| extend my_field = tostring(customDimensions.some_field)
| where customDimensions.statusCode == 200 and timestamp >= startofday(datetime(2022-12-01T00:00:00Z)) and timestamp <= endofday(datetime(2022-12-31T00:00:00Z))
| summarize Count = count() by my_field

17,774 items.

3 - summarize with by some_field

customEvents
| extend StatusCode = tostring(customDimensions["statusCode"]), MyField = tostring(customDimensions["some_field"])
| where timestamp >= startofday(datetime(2022-12-01T00:00:00Z)) and timestamp <= endofday(datetime(2022-12-31T00:00:00Z))
| summarize any(StatusCode) by MyField
| summarize Count = count() by any_StatusCode

17,626 items.

4 - dcount with higher accuracy?

customEvents
| where (customDimensions.statusCode == 200) and timestamp >= startofday(datetime(2022-12-01T00:00:00Z)) and timestamp <= endofday(datetime(2022-12-31T00:00:00Z))
| summarize dcount(tostring(customDimensions.some_field),4)

17,736 items

5 - count_distinct from preview

customEvents
| where (customDimensions.statusCode == 200) and timestamp >= startofday(datetime(2022-12-01T00:00:00Z)) and timestamp <= endofday(datetime(2022-12-31T00:00:00Z))
| summarize count_distinct(tostring(customDimensions.some_field))

17,744 items

According to the learn.microsoft.com it states:

Use dcount and dcountif to count distinct values in a specific column.

And dcount-aggfunction mentions the accuracy:

Returns an estimate of the number of distinct values of expr in the group.

count_distinct seems to be the correct way:

Counts unique values specified by the scalar expression per summary group, or the total number of unique values if the summary group is omitted.

Answer 1

count_distinct() is a new KQL function that returns an accurate result.

dcount() returns an approximate result.
It can be used with a 2nd argument, a constant integer with value 0, 1, 2, 3 or 4 (0 = fast, 1 = default, 2 = accurate, 3 = extra accurate, 4 = super accurate).

In your examples (specifically "4 - dcount with higher accuracy?") you have not used a 2nd argument.
Higher accuracy means higher accuracy - statistically .
It means that the error will be bound to a lower value.
Theoretically (and in practice) dcount() with lower accuracy may yield in some scenarios a result that is closer to the real number than dcount() with higher accuracy.

Having said that -

I would guess that you executed your queries with a UI filter of last 24 hours or something similar.
This means that each execution ran over a different timespan.

Kusto / Azure Data Explorer - Distinct count in kusto queries

Question

1 - dcount with default accuracy

2 - Count by my_field and count number of rows

3 - summarize with by some_field

4 - dcount with higher accuracy?

5 - count_distinct from preview

1 answers

solution1
0 2023-01-17 11:18:23

Kusto / Azure Data Explorer - Distinct count in kusto queries

Question

1 - dcount with default accuracy

2 - Count by my_field and count number of rows

3 - summarize with by some_field

4 - dcount with higher accuracy?

5 - count_distinct from preview

1 answers

solution1 0 2023-01-17 11:18:23

solution1
0 2023-01-17 11:18:23