简体   繁体   English

如何在雪花中生成表和表列的统计信息?

[英]How to generate statistics of a table and columns of a table in snowflake?

Is there any function available like Generate Statistics in Netezza to generate the column metadata (duplicates, unique values, min value, max value etc) in snowflake.是否有任何 function 可用,例如在 Netezza 中生成统计信息以在雪花中生成列元数据(重复项、唯一值、最小值、最大值等)。

No, not really.不,不是。

You have the TABLES View which contains size(storage) and number of rows,您有包含大小(存储)和行数的TABLES视图
but the rest of the information (including the COLUMNS View ) is related to schema metadata and not data metadata.但是信息(包括COLUMNS View )的 rest 与模式元数据有关,而不是与数据元数据有关。

On the other hand the table structure itself (aka micro-partitions ) contains table metadata that makes eg MIN() and MAX() functions very efficient.另一方面,表结构本身(又名微分区)包含表元数据,这使得例如MIN()MAX()函数非常高效。 Some of the table statistics may be cached globally (ie in the Cloud Services part of the Snowflake architecture )某些表统计信息可能会全局缓存(即在Snowflake 架构的云服务部分)

Thank you for the question on stats gathering in Snowflake.感谢您提出有关在雪花中收集统计数据的问题。 Some information:一些信息:

  1. During data loading (all DMLs like COPY, INSERT/UPDATE/DELETE), these stats are already automatically gathered by Snowflake on micro-partition level.在数据加载期间(所有 DML,如 COPY、INSERT/UPDATE/DELETE),Snowflake 已经在微分区级别自动收集这些统计信息。
  2. During query processing, these stats are automatically leveraged by our optimizer for query performance.在查询处理期间,我们的优化器会自动利用这些统计信息来提高查询性能。
  3. Automatic background service like automatic clustering service (if enabled for a given table) will also continuously and incrementally work on fine-tuning the clustering quality of a table with those stats.自动后台服务,如自动集群服务(如果为给定表启用)也将持续和增量地使用这些统计信息微调表的集群质量。

All these auto-magic features happen without user manual intervention (hence why Snowflake is known as a self-tuning, simple to use, data warehousing platform).所有这些自动魔术功能都无需用户手动干预(因此 Snowflake 被称为自我调整、易于使用的数据仓库平台)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Snowflake 中重命名表的多个列? - How to rename multiple columns of a table in Snowflake? COPY INTO 带有额外列的雪花表 - COPY INTO Snowflake Table with Extra Columns 如何在 Snowflake SQL 中创建列和行相同的 pivot 表? - How to create a pivot table where columns and rows are the same in Snowflake SQL? 如何使用 Snowflake Javascript 存储过程或 Function 遍历表中的所有列? - How to iterate over all columns in a table using Snowflake Javascript Stored Procedure or Function? 如何在 SQL / Snowflake 中创建列和行相同的交叉表/系数表? - How to create a crosstab / coefficient table where columns and rows are the same in SQL / Snowflake? 如何 UPIVOT 表中的所有列并聚合到数据质量/验证指标中? SQL 雪花 - How to UPIVOT all columns in a table and aggregate into Data Quality/ Validation Metrics? SQL SNOWFLAKE Pyspark:如何从表中提取统计信息? - Pyspark: how to extract statistics from a table? snobuilding 一个表,其中包含雪花中键值数组中的动态列 - snobuilding a table with dynamic columns from a key value array in snowflake 使用具有 null 值的列从表中删除 - Delete from table using columns with null values Snowflake 雪花(存储过程):将表中的 json object 展平成几列 - Snowflake (stored procedure): flatten json object in table into several columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM