简体繁体 English

AWS Redshift中的并发查询，COPY和连接

[英]Concurrent Queries, COPY and Connections in AWS Redshift

原文 2018-06-19 21:29:10 2 1 amazon-web-services/ amazon-redshift/ amazon-redshift-spectrum

I am trying to understand the difference between concurrent connections and concurrent queries in Redshift. 我试图了解Redshift中并发连接和并发查询之间的区别。 As per documents, We can make 500 concurrent connections to a Redshift cluster but it says maximum 15 queries can be run at the same time in a cluster. 根据文档，我们可以与Redshift集群建立500个并发连接，但它说在一个集群中最多可以同时运行15个查询。 Now what is the exact value? 现在的确切值是多少？

How many queries can be in running state in a cluster at the same time ? 一个集群中可以同时有几个查询处于运行状态？ If it is 15, does it include RETURNING state queries as well ? 如果为15，它是否还包括RETURNING状态查询？
How many concurrent COPY statement can run in a cluster ? 一个集群中可以运行多少个并发COPY语句？

We are evaluating Redshift as our primary reporting data store. 我们正在将Redshift评估为主要报告数据存储。 If we cannot run a large number of queries simultaneously it may be difficult for us to go with this model. 如果我们无法同时运行大量查询，那么使用该模型可能会很困难。

1 个解决方案

I think, you have misread somewhere, Max concurrent queries are 50 per WLM. 我认为，您在某处读错了，每个WLM的最大并发查询数为50。 Refer below thread for Amazon support response for more detail. 请参阅以下线程以获取Amazon支持响应以获取更多详细信息。

How many queries can be in running state in a cluster at the same time ? 一个集群中可以同时有几个查询处于运行状态？ If it is 15, does it include RETURNING state queries as well ? 如果为15，它是否还包括RETURNING状态查询？

At a time, Max 50 queries could be running concurrently. 一次最多可以同时运行50个查询。 Yes it does include INSERT/UPDATE/DELETE etc all. 是的，它确实包括INSERT / UPDATE / DELETE等所有内容。

How many concurrent COPY statement can run in a cluster ? 一个集群中可以运行多少个并发COPY语句？

Ideally, you could Max go up to 50 concurrently, but Copy works bit differently. 理想情况下，Max最多可以同时增加50个，但是Copy的工作方式有所不同。

Amazon Redshift automatically loads in parallel from multiple data files. Amazon Redshift自动从多个数据文件并行加载。

If you use multiple concurrent COPY commands to load one table from multiple files, Amazon Redshift is forced to perform a serialized load, which is much slower and requires a VACUUM at the end if the table has a sort column defined. 如果您使用多个并发COPY命令从多个文件中加载一个表，则Amazon Redshift被迫执行序列化加载，这要慢得多，并且如果表中定义了排序列，则最后需要VACUUM。 For more information about using COPY to load data in parallel, see Loading Data from Amazon S3. 有关使用COPY并行加载数据的更多信息，请参阅从Amazon S3加载数据。

Meaning, you could run concurrent Copy commands but make sure one copy command at a time per table. 这意味着，您可以运行并发的Copy命令，但请确保每个表一次一次复制命令。

So practically, it doesn't depend on Nodes on cluster, but Number of tables as well. 因此，实际上，它不依赖于群集上的节点，也依赖于表数。 So if you have only 1 table, you would like to execute 50 insert concurrently, it will result only 1 Copy concurrently. 因此，如果只有1个表，则要同时执行50次插入，则将仅产生1个副本。