简体繁体 English

如何在雪花中生成表和表列的统计信息？

[英]How to generate statistics of a table and columns of a table in snowflake?

原文 2019-11-22 06:56:27 4 2 sql/ snowflake-datawarehouse

Is there any function available like Generate Statistics in Netezza to generate the column metadata (duplicates, unique values, min value, max value etc) in snowflake.是否有任何 function 可用，例如在 Netezza 中生成统计信息以在雪花中生成列元数据（重复项、唯一值、最小值、最大值等）。

2 个解决方案

No, not really.不，不是。

You have the TABLES View which contains size(storage) and number of rows,您有包含大小（存储）和行数的TABLES视图，
but the rest of the information (including the COLUMNS View ) is related to schema metadata and not data metadata.但是信息（包括COLUMNS View ）的 rest 与模式元数据有关，而不是与数据元数据有关。

On the other hand the table structure itself (aka micro-partitions ) contains table metadata that makes eg MIN() and MAX() functions very efficient.另一方面，表结构本身（又名微分区）包含表元数据，这使得例如MIN()和MAX()函数非常高效。 Some of the table statistics may be cached globally (ie in the Cloud Services part of the Snowflake architecture )某些表统计信息可能会全局缓存（即在Snowflake 架构的云服务部分）

Thank you for the question on stats gathering in Snowflake.感谢您提出有关在雪花中收集统计数据的问题。 Some information:一些信息：

During data loading (all DMLs like COPY, INSERT/UPDATE/DELETE), these stats are already automatically gathered by Snowflake on micro-partition level.在数据加载期间（所有 DML，如 COPY、INSERT/UPDATE/DELETE），Snowflake 已经在微分区级别自动收集这些统计信息。
During query processing, these stats are automatically leveraged by our optimizer for query performance.在查询处理期间，我们的优化器会自动利用这些统计信息来提高查询性能。
Automatic background service like automatic clustering service (if enabled for a given table) will also continuously and incrementally work on fine-tuning the clustering quality of a table with those stats.自动后台服务，如自动集群服务（如果为给定表启用）也将持续和增量地使用这些统计信息微调表的集群质量。

All these auto-magic features happen without user manual intervention (hence why Snowflake is known as a self-tuning, simple to use, data warehousing platform).所有这些自动魔术功能都无需用户手动干预（因此 Snowflake 被称为自我调整、易于使用的数据仓库平台）。