简体   繁体   中英

Redshift WLM config: how is unallocated memory used?

When you define Redshift query queues, you can assign the proportion of memory allocated to each queue. So for example, if you had 5 queues, you might assign each one of them 20% of the memory. However, you also allowed to allocate the memory such that a portion of it remains unallocated.

In this documentation: http://docs.aws.amazon.com/redshift/latest/dg/cm-c-defining-query-queues.html it says, "Any unallocated memory is managed by Amazon Redshift and can be temporarily given to a queue if the queue requests additional memory for processing. For example, if you configure four queues, you can allocate memory as follows: 20 percent, 30 percent, 15 percent, 15 percent. The remaining 20 percent is unallocated and managed by the service."

Earlier in the documentation, it says, "If a specific query needs more memory than is allocated to a single query slot, you can increase the available memory by increasing the wlm_query_slot_count parameter. The following example sets wlm_query_slot_count to 10, performs a vacuum, and then resets wlm_query_slot_count to 1."

Is this related to the memory allocation? Can the query slot count adjustment be used to temporarily consume more memory than the whole queue is normally allowed?

I think my question is really about this part of the first quote, "Any unallocated memory is managed by Amazon Redshift and can be temporarily given to a queue if the queue requests additional memory for processing ."

Does this mean that the user running a query has to specifically request the additional memory? Does this mean that leaving some memory unallocated is of no use unless you make these specific requests?

The two concepts of wlm_query_slot_count and memory allocation for a queues are different.

When you assign the concurrency level of your cluster to 20 for example, you are creating 20 slots of execution. If these smaller slots (compare to the default larger 5 slots), are too small for some queries (such as VACUUM or larger reports), you can give these specific queries multiple slots instead of a single one, using wlm_query_slot_count.

The resources allocation to the various slots in terms of CPU, IO and RAM doesn't have to be uniform, as you can give some queues more memory than other, as the queries who are sending to this queue need more memory. You can know that more memory is needed when you see that more queries are spilling to disk when they run out of memory during their calculation.

For each query that you are running, Redshift will estimate the memory requirements, based on the columns you are hitting, and the function you are applying on these columns (this is another good reason to have as narrow as possible column definitions). If the WLM has unallocated memory, it can give some of it to the queries that need it.

Nevertheless, when you are creating such queues definitions you are missing on the cluster flexibility to assign resources to queries. For example, you might create a queue that is completely jammed, while other queues are idle and wasting cluster resources. Therefore, do it with care, and monitor the usage of these queues to verify that you are actually improving your cluster prioritization and performance and not hurting it.

The short answer is - wlm_query_slot_count and unallocated memory memory management are two different orthogonal things.

Think of wlm_query_slot_count as cell merge in Excel. If you have 5 cells (5 slots in a queue), each text can by default only take 1 cell (1 slot). By setting wlm_query_slot_count explicitly for the query you are telling Redshift to merge the cells (slots) for that bit of text (query). So if you set wlm_query_slot_count to 3, this particular query will take 3 slots, its like decided to spread long text into 3 merged cells in Excel.

From the queue management point of view, that would be as if someone has taken 3 slots already. So only 2 more 1-slot queries are allowed into the queue, everyone else has to wait.

In terms of memory, queue has fixed memory allocation overall, equally spread between slots. So if whole queue has 100GB of memory, 5 slots, each slot would get 20GB. Query which was given 3 slots in this queue, would then get 60GB.

And "unallocated memory management" is orthogonal to that - regardless of slots and queues, if memory is needed and it is unallocated, Redshift at its own discretion can decide to give it to any query (I think the wording of "if the queue requests additional memory" is misleading), usually based on the plan/table statistics.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM