简体   繁体   中英

How to Hash an Entire Redshift Table?

I want to hash entire redshift tables in order to check for consistency after upgrades, backups, and other modifications which shouldn't affect table data.

I've found Hashing Tables to Ensure Consistency in Postgres, Redshift and MySQL but the solution still requires spelling out each column name and type so it can't be applied new tables in a generic manner. I'd have to manually change column names and types.

Is there some other function or method by which I could hash / checksum entire tables in order to confirm they are identical? Ideally without spelling out the specific column and column types of that table.

There is certainly no in-built capability in Redshift to hash whole tables.

Also, I'd be a little careful of the method suggested in that article because, from what I can see, it is calculating a hash of all the values in a column but isn't associating the hashed value with a row identifier. Therefore if Row 1 and Row 2 swapped values in a column, the hash wouldn't change. So, it's not strictly calculating an adequate hash (but I could be wrong!).

You could investigate using the new Stored Procedures in Redshift to see whether you can create a generic function that would work for any table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM