简体   繁体   English

Azure Data Explorer 外部表分区有什么用?

[英]What are Azure Data Explorer external table partitions good for?

Adding pertition to the external table definition does not help with a query on the partition.向外部表定义添加分区对分区查询没有帮助。

Blob path example斑点路径示例

  • /data/1234/2021/12/02/9483D.parquet /data/1234/2021/12/02/9483D.parquet
  • /data/1235/2021/12/02/12345.parquet /data/1235/2021/12/02/12345.parquet

Partition (pseudo syntax not the real one): '/data/'uniqueid'/yyyy/MM/dd/'分区(伪语法不是真正的语法):'/data/'uniqueid'/yyyy/MM/dd/'

So only two uniqueids values are in the storage path.因此存储路径中只有两个 uniqueids 值。 Total files count ~ 1 million for different dates in the path路径中不同日期的文件总数约为 100 万

So I defined 2 partitions as virtual columns:所以我定义了 2 个分区作为虚拟列:

  1. uniqueid唯一身份
  2. datetime约会时间

Executing a query on the uniqueid like: table | summarize by uniqueid对 uniqueid 执行查询,例如: table | summarize by uniqueid table | summarize by uniqueid goes over all files in the blob storage for some reason. table | summarize by uniqueid出于某种原因遍历 blob 存储中的所有文件。

As the uniqueid is a partition and as virtual column, shouldn't the query be super fast as we have only 2 values in the path for it?由于 uniqueid 是一个分区和虚拟列,查询不应该超快,因为我们在它的路径中只有 2 个值吗? Am I totally missing the point of partitioning?我完全错过了分区的重点吗?

EDIT add smaple:编辑添加样本:

.create external table ['sensordata'] (['timestamp']:long,['value']:real)
    kind = adl
partition by (['uniqueid']:string ,['datecreated']:datetime )
pathformat = (['uniqueid']  '/' datetime_pattern("yyyy/MM/dd", ['daterecorded']))
    dataformat = parquet
    (
        h@'abfss://XXXXXX@YYYYYYYY.dfs.core.windows.net/histdata;impersonate'
    )
    with (FileExtension='.parquet')

Query sample:查询示例:

sensordata
| summarize by uniqueid

Thanks for your input, @user998888.感谢您的输入,@user998888。

We have many optimizations for partitioned external tables, and we invest significant effort in adding more and more optimizations.我们对分区外部表有很多优化,并且我们投入大量精力来添加越来越多的优化。 But we still haven't optimized the type of query like the one you provided.但是我们仍然没有像您提供的那样优化查询类型。 It's on our list.它在我们的名单上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM