What are Azure Data Explorer external table partitions good for?

Question

Adding pertition to the external table definition does not help with a query on the partition.

Blob path example

/data/1234/2021/12/02/9483D.parquet
/data/1235/2021/12/02/12345.parquet

Partition (pseudo syntax not the real one): '/data/'uniqueid'/yyyy/MM/dd/'

So only two uniqueids values are in the storage path. Total files count ~ 1 million for different dates in the path

So I defined 2 partitions as virtual columns:

uniqueid
datetime

Executing a query on the uniqueid like: table | summarize by uniqueid table | summarize by uniqueid goes over all files in the blob storage for some reason.

As the uniqueid is a partition and as virtual column, shouldn't the query be super fast as we have only 2 values in the path for it? Am I totally missing the point of partitioning?

EDIT add smaple:

.create external table ['sensordata'] (['timestamp']:long,['value']:real)
    kind = adl
partition by (['uniqueid']:string ,['datecreated']:datetime )
pathformat = (['uniqueid']  '/' datetime_pattern("yyyy/MM/dd", ['daterecorded']))
    dataformat = parquet
    (
        h@'abfss://XXXXXX@YYYYYYYY.dfs.core.windows.net/histdata;impersonate'
    )
    with (FileExtension='.parquet')

Query sample:

sensordata
| summarize by uniqueid

Answer 1

Thanks for your input, @user998888.

We have many optimizations for partitioned external tables, and we invest significant effort in adding more and more optimizations. But we still haven't optimized the type of query like the one you provided. It's on our list.

What are Azure Data Explorer external table partitions good for?

Question

1 answers

solution1
0 2022-02-28 14:05:42

What are Azure Data Explorer external table partitions good for?

Question

1 answers

solution1 0 2022-02-28 14:05:42

solution1
0 2022-02-28 14:05:42