简体   繁体   English

Synapse Serverless SQL 中的大型表的 CETAS 超时

[英]CETAS times out for large tables in Synapse Serverless SQL

I'm trying to create a new external table using CETAS ( CREATE EXTERNAL TABLE AS SELECT * FROM <table> ) statement from an already existing external table in Azure Synapse Serverless SQL Pool.我正在尝试使用 CETAS ( CREATE EXTERNAL TABLE AS SELECT * FROM <table> ) 语句从 Azure Synapse Serverless Z9778240A0101CB305C9 中已经存在的外部表中创建一个新的外部表。 The table I'm selecting from is a very large external table built on around 30 GB of data in parquet format stored in ADLS Gen 2 storage but the query always times out after about 30 minutes.我从中选择的表是一个非常大的外部表,它建立在大约 30 GB 的镶木地板格式的数据上,存储在 ADLS Gen 2 存储中,但查询总是在大约 30 分钟后超时。 I've tried using premium storage and also tried out most if not all the suggestions made here as well but it didn't help and the query still times out.我已经尝试使用高级存储,并且也尝试了大多数(如果不是所有) 这里提出的建议,但它没有帮助,查询仍然超时。 The error I get in Synapse Studio is:-我在 Synapse Studio 中遇到的错误是:-

Statement ID: {550AF4B4-0F2F-474C-A502-6D29BAC1C558} | Query hash: 0x2FA8C2EFADC713D | Distributed request ID: {CC78C7FD-ED10-4CEF-ABB6-56A3D4212A5E}. Total size of data scanned is 0 megabytes, total size of data moved is 0 megabytes, total size of data written is 0 megabytes. Query timeout expired.

The core use case is that assuming I only have the external table name, I want to create a copy of the data over which that external table is created in Azure storage itself.核心用例是假设我只有外部表名,我想创建一个数据副本,在 Azure 存储本身中创建该外部表。

Is there a way to resolve this timeout issue or a better way to solve the problem?有没有办法解决这个超时问题或更好的方法来解决这个问题?

This is a limitation of Serverless.这是无服务器的限制。

Query timeout expired查询超时已过期

The error Query timeout expired is returned if the query executed more than 30 minutes on serverless SQL pool.如果查询在 serverless SQL 池上执行超过 30 分钟,则会返回错误 Query timeout expired。 This is a limit of serverless SQL pool that cannot be changed.这是无法更改的无服务器 SQL 池的限制。 Try to optimize your query by applying best practices, or try to materialize parts of your queries using CETAS.尝试通过应用最佳实践来优化您的查询,或尝试使用 CETAS 实现部分查询。 Check is there a concurrent workload running on the serverless pool because the other queries might take the resources.检查无服务器池上是否有并发工作负载运行,因为其他查询可能会占用资源。 In that case you might split the workload on multiple workspaces.在这种情况下,您可能会将工作负载拆分到多个工作区。

Self-help for serverless SQL pool - Query Timeout Expired 无服务器 SQL 池的自助 - 查询超时已过期

The core use case is that assuming I only have the external table name, I want to create a copy of the data over which that external table is created in Azure storage itself.核心用例是假设我只有外部表名,我想创建一个数据副本,在 Azure 存储本身中创建该外部表。

It's simple to do in a Data Factory copy job, a Spark job, or AzCopy.在数据工厂复制作业、Spark 作业或 AzCopy 中执行此操作很简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM