At the company where I work, they are about to migrate from the legacy DB2 database to Snowflake.
Database Configuration for Database DWPROD
Database territory = US
Database code page = 819
Database code set = ISO8859-1
LANG=en_US
The target database has been configured by default, meaning UTF-8 collation. There was already a need to trim all text columns before loading the data into Snowlake, because trailing spaces were causing problems with some joins. (On DB2 side, collation was responsible to take care of it) I've now realized yet another, obvious, problem with sorting:
Snowflake with UTF-8 sorts upper case letters before lower case letters (AZ first, then az). DB2 on the other hand sorts a,A before b,B and so on.
I'm trying to find more examples showing what might go wrong so I could present them to stop the madness.
I've already collected examples of issues listed above. I'm expecting (dreaming of) getting some answers from experienced people who has a lot of experience with collation, unicode. Some could say it's about the basic stuff. But these days it looks like everybody ignores it. It would also be great to share here some stories when such migrations failed or needed to be redone.
It's important to know the limitations of using non-default collation on Snowflake:
https://docs.snowflake.com/en/sql-reference/collation.html#collation-limitations
For me personally, the limitation on UDFs is sufficient reason to avoid changing the default collation. Sometimes there's simply no substitute for a UDF, and when you need one and can't use one with the non-default collation, this is a problem. The reduction in string limits from 16 to 8 Mb and no support for collated strings in arrays, objects, and variants are also a major considerations.
You can use trim() and ilike instead of like to handle case sensitivity and trailing/leading spaces. For sorting, you may need to have an upper/lower column, an age-old way to deal with case sensitive comparisons in databases.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.