简体   繁体   中英

Redshift CPU utilisation is 100 percent most of the time

I have a 96 Vcpu Redshift ra3.4xlarge 8 node cluster, Most of the times the CPU utilisation is 100 percent, It was a dc2.large 3 node cluster before, that was also always 100 percent that's why we increased it to ra3. We are doing most of our computes on Redshift but the data is not that much, I read somewhere Doesn't matter how much compute you increase unless its significantly. there will only be a slight improvement in the Computation? Can anyone explain this?

I can give it a shot. Having 100% CPU for long stretches of time is generally not a good (optimal) thing in Redshift. You see Redshift is made for performing analytics on massive amounts of structured data. To do this it utilizes several resources - disks/disk IO bandwidth, memory, CPU, and.network bandwidth. If you workload is well matched to Redshift your utilization of all these things will average around 60%. Sometimes CPU bound, sometimes memory bound, sometimes.network bandwidth bound, etc. Lots of data being read means disk IO bandwidth is at a premium, lots of redistribution of data means.network IO bandwidth is constraining. If you are using all these factors above 50% capacity you are getting what you paid for. Once any of these factors gets to 100% there is a significant drop-off of performance as working around the oversubscribed item steals performance.

Now you are in a situation where you are see 100% for a significant portion of the operating time, right? This means you have all these other attributes you have paid for but are not using AND inefficiencies are being realized to manage through this (though of all the factors, high CPU has the lease overhead). The big question is why.

There are a few possibilities but the most likely, in my experience, is inefficiently queries. An example might be the best way to explain this. I've seen queries that are intended to find all the combinations of certain factors from several tables. So they cross join these tables but this produces lots of repeats so they add DISTINCT, problem solved. But this still creates all the duplicates and then reduces the set down. All the work is being done and most of the results thrown away. However, if they pared down the factors in the tables first, then cross joined them, the total work will be significantly lower. This example will do exactly what you are seeing, high CPU as it spins making repeat combinations and then throwing most of them away.

If you have many of this type of "fat in the middle" query where lots of extra data is made and immediately reduced, you won't get a lot of benefit for adding CPU resources. Things will get 2X faster with 2X the cluster size but you are buying 2X of all these other resources that aren't helping you. You would expect that buying 2X CPU and 2X memory and 2X disk IO etc. would give you much more than a 2X improvement. Being constrained on 1 thing make scaling costly. Also, you are unlikely to see the CPU utilization come down as your queries just "spin the tires" of the CPU. More CPUs will just mean you can run more queries resulting in the spinning more tires.

Now the above is just my #1 guess based on my consulting experience. It could be that your workload just isn't right for Redshift. I've seen people try to put many small database problems into Redshift thinking that it's powerful so it must be good at this too. They turn up the slot count to try to pump more work into Redshift but just create more issues. Or I've seem people try to run transactional workloads. Or... If you have the wrong tool for the job it may not work well. One 6 ton dump truck isn't the same thing as 50 motorcycle delivery team - each has their purpose but they aren't interchangeable.

Another possibility is that you have a very unusual workload but Redshift is still the best tool for the job. You don't need all the strengths of Redshift but this is ok, you are getting the job done at an appropriate cost. If this case 100% CPU is just how your workload uses Redshift. It's not a problem, just reality. Now I doubt this is the case, but it is possible. I'd want to be sure I'm getting all the value from the money I'm spending before assuming everything is ok.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM