简体   繁体   中英

What are Azure Data Explorer external table partitions good for?

Adding pertition to the external table definition does not help with a query on the partition.

Blob path example

  • /data/1234/2021/12/02/9483D.parquet
  • /data/1235/2021/12/02/12345.parquet

Partition (pseudo syntax not the real one): '/data/'uniqueid'/yyyy/MM/dd/'

So only two uniqueids values are in the storage path. Total files count ~ 1 million for different dates in the path

So I defined 2 partitions as virtual columns:

  1. uniqueid
  2. datetime

Executing a query on the uniqueid like: table | summarize by uniqueid table | summarize by uniqueid goes over all files in the blob storage for some reason.

As the uniqueid is a partition and as virtual column, shouldn't the query be super fast as we have only 2 values in the path for it? Am I totally missing the point of partitioning?

EDIT add smaple:

.create external table ['sensordata'] (['timestamp']:long,['value']:real)
    kind = adl
partition by (['uniqueid']:string ,['datecreated']:datetime )
pathformat = (['uniqueid']  '/' datetime_pattern("yyyy/MM/dd", ['daterecorded']))
    dataformat = parquet
    (
        h@'abfss://XXXXXX@YYYYYYYY.dfs.core.windows.net/histdata;impersonate'
    )
    with (FileExtension='.parquet')

Query sample:

sensordata
| summarize by uniqueid

Thanks for your input, @user998888.

We have many optimizations for partitioned external tables, and we invest significant effort in adding more and more optimizations. But we still haven't optimized the type of query like the one you provided. It's on our list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM