简体   繁体   中英

Redshift query planner and views

I have seen in a few non-Amazon sources that the Redshift query planner has problems working with views (here is one source , here is another , here is a third ). By views I mean standard SQL views, not the newly-available materialized views. However I can't find anything about this in the developer's guide, and these sources listed above are a few years out of date. Does anyone know what the current situation is with the Redshift query planner and views, and if there is official Redshift documentation that describes it, where it is located?

The arguments of the blogs are, as you say, a bit outdated as they present as one of the main drawbacks of views the fact that they couldn't be materialized at the time of writing, which is not the case anymore.

The first link just says that Redshift has trouble at optimizing queries involving views but doesn't show any benchmark/proof of that nor it explains why and in which way.

The second and third sources have some more merit in that they actually provide alternatives, which are creating an actual table or materialize the view.

My understanding is that views in Redshift don't inherently suffer from bad performances but that instead, given their transient nature, they don't take advantage of the clustered architecture of Redshift. Additionally, as mentioned by some of the resources you linked as well, the queries that make up a view get executed every time you query the view and that definitely doesn't help performances. I would definitely suggest you to consider aggregating your data in actual tables or look into materializing these views.

To better understand how the planner works I'd take a look at this Query planning and execution workflow

Redshift has no problem working with views. The logic of the view is combined with the rest of the query that calls the view, similar to a subquery or CTE. Redshift plans and optimizes the entire statement (outer query + view logic) as a single statement.

The are 2 main "issues" that people have with views:

  1. Views are bound to the tables (or other views) that they reference. You cannot drop them or make certain changes to them without first dropping the view. To address this Redshift offers WITH NO SCHEMA BINDING syntax so that the view is not bound to its objects. The compromise is that the view is not checked and queries against it may fail if underlying objects are changed.
  2. Views make it very easy to generate extremely complex and inefficient queries that look "simple". This particularly happens when you nest views on top of views. You can use EXPLAIN to see the query plan that Redshift will use for a given query to see how your view is processed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM