简体   繁体   中英

Does a redshift materialized view refresh lock the base tables?

As the title suggests, I'm not sure if the refresh materialized view command locks the base tables (ie doesn't allow reads and writes). The documentation doesn't suggest that, and I wouldn't think it would need to given that it is simply reading them and then updating the view within a transaction. However, I consistently get a serializable isolation violation during the refresh so I'm trying to figure out what is causing that.

Additional context:

  1. We have a system with four base tables from which the materialized view reads upon refresh. We have insertions into these base tables happening throughout the day via the copy command . AWS Firehose executes the copy query which loads the data (which is in json format) from S3 periodically. These insertions are happening close to constantly. Probably at least 1 every 5 minutes.
  2. The materialized view refresh takes ~7 minutes to complete and refreshes every 10 minutes. Probably 1 out of every 4 executions will fail.
  3. We also have several quicksight dashboards backed by spice . Those SPICE datasets (~6 datasets) refresh every 15 minutes.

You need two sessions both with a write to create a serialization error. So there is more going on that is important to this issue that isn't described. Just refreshing a materialized view cannot create this issue by itself.

See: https://aws.amazon.com/premiumsupport/knowledge-center/redshift-serializable-isolation/

What else is happening in the transaction that includes the refresh? What other transaction is in flight that is the other half of the issue? If this is happening repeatedly then there is likely some ETL / orchestration process running that is kicking off both transactions in such a way that they often interfere with each other.

If it is not clear why this is happening you will want to review MVCC - https://en.wikipedia.org/wiki/Multiversion_concurrency_control MVCC allows for many parallel operations to happen on tables without blocking each other. Setting up the circumstances to create a serialization error often takes some care and resolving a serialization issue often just takes changes one small aspect of what is going on. However, if this is happening it takes understanding of the systems to figure out what two processes are interfering.

A serializable isolation violation occurs when two transactions write to database, and one reads from a table that is modified by the other. This situation is described in detail here .

When a violation occurs in a user transaction, one of the two transactions is aborted with a message like this:

Serializable isolation violation on table - 2342993, transactions forming the cycle are: 104865204, 104866208, 104865323 (pid:20589);

This provides enough information to list the queries of each transaction involved:

SELECT * FROM svl_qlog WHERE xid IN (104865204, 104866208, 104865323) ORDER BY starttime;

In the case of a failure to refresh a materialized view, you may find the XID with:

SELECT * FROM SVL_MV_REFRESH_STATUS WHERE status LIKE '%violation%';

This gives your the table ID. Get the name with:

SELECT * FROM pg_class WHERE oid = ...;

However, this is only one side of the story. You are missing the XID that succeeded and interfered with the refresh.

Unfortunately, that's a bit of detective work, and I would start with inspecting queries involving the table:

SELECT * FROM SVL_STATEMENTTEXT
WHERE text LIKE '%your_table_name%'
  AND starttime > ... 
  AND endtime < ...

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM