简体   繁体   English

从 DB2(en_US 归类)切换到 Snowflake(默认归类 UTF-8)是个好主意吗?

[英]Is switching from DB2 (en_US collation) to Snowflake (with default collation UTF-8) a good idea?

At the company where I work, they are about to migrate from the legacy DB2 database to Snowflake.在我工作的公司,他们即将从遗留的 DB2 数据库迁移到 Snowflake。

Database Configuration for Database DWPROD
    Database territory                                      = US
    Database code page                                      = 819
    Database code set                                       = ISO8859-1
    LANG=en_US

The target database has been configured by default, meaning UTF-8 collation.目标数据库已经默认配置好,即UTF-8 collation。 There was already a need to trim all text columns before loading the data into Snowlake, because trailing spaces were causing problems with some joins.在将数据加载到 Snowlake 之前,已经需要修剪所有文本列,因为尾随空格会导致某些连接出现问题。 (On DB2 side, collation was responsible to take care of it) I've now realized yet another, obvious, problem with sorting: (在 DB2 方面,整理负责处理它)我现在意识到另一个明显的排序问题:
Snowflake with UTF-8 sorts upper case letters before lower case letters (AZ first, then az).带有 UTF-8 的雪花将大写字母排在小写字母之前(首先是 AZ,然后是 az)。 DB2 on the other hand sorts a,A before b,B and so on.另一方面,DB2 在 b、B 等之前对 a、A 进行排序。

I'm trying to find more examples showing what might go wrong so I could present them to stop the madness.我试图找到更多示例来说明 go 可能有什么错误,这样我就可以展示它们来阻止这种疯狂行为。

I've already collected examples of issues listed above.我已经收集了上面列出的问题的示例。 I'm expecting (dreaming of) getting some answers from experienced people who has a lot of experience with collation, unicode. Some could say it's about the basic stuff.我期待(梦想)从有很多整理经验的有经验的人那里得到一些答案,unicode。有些人可能会说这是关于基本的东西。 But these days it looks like everybody ignores it.但是现在似乎每个人都忽略了它。 It would also be great to share here some stories when such migrations failed or needed to be redone.当此类迁移失败或需要重做时,在这里分享一些故事也很棒。

It's important to know the limitations of using non-default collation on Snowflake:了解在 Snowflake 上使用非默认排序规则的限制很重要:

https://docs.snowflake.com/en/sql-reference/collation.html#collation-limitations https://docs.snowflake.com/en/sql-reference/collation.html#collation-limitations

For me personally, the limitation on UDFs is sufficient reason to avoid changing the default collation.就我个人而言,UDF 的限制是避免更改默认排序规则的充分理由。 Sometimes there's simply no substitute for a UDF, and when you need one and can't use one with the non-default collation, this is a problem.有时根本没有 UDF 的替代品,当您需要一个 UDF 而不能将其与非默认排序规则一起使用时,这就是一个问题。 The reduction in string limits from 16 to 8 Mb and no support for collated strings in arrays, objects, and variants are also a major considerations.字符串限制从 16 Mb 减少到 8 Mb,并且不支持 arrays、对象和变体中的整理字符串也是一个主要考虑因素。

You can use trim() and ilike instead of like to handle case sensitivity and trailing/leading spaces.您可以使用 trim() 和 ilike 而不是 like 来处理区分大小写和尾随/前导空格。 For sorting, you may need to have an upper/lower column, an age-old way to deal with case sensitive comparisons in databases.对于排序,您可能需要有一个上/下列,这是一种处理数据库中区分大小写比较的古老方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Android openOrCreateDatabase无法将db的语言环境更改为en_US - Android openOrCreateDatabase failed to change locale for db to en_US 如何通过与数据库默认排序规则不同的排序规则比较符号? - How to compare symbols by collation different from database default collation? MySQL和PDO:使用UTF-8排序规则在一个表中使用两种语言 - MySQL and PDO: two languages in one table using UTF-8 collation 尽管数据库排序规则是utf8_general_ci,但是grails表排序规则latin_swedish_ci - grails table collation latin_swedish_ci although db collation is utf8_general_ci 休眠的不区分大小写的utf-8 / unicode归类,可在多个DBMS上运行 - Hibernate case-insensitive utf-8/unicode collation that works on multiple DBMS 如何在mysql中更改数据库排序规则 - how to change the db collation in mysql DB2 SQL错误:SQLCODE = -901,SQLSTATE = 58004,SQLERRMC =无效的归类ID,DRIVER = 4.21.29 - DB2 SQL Error: SQLCODE=-901, SQLSTATE=58004, SQLERRMC=Invalid collation ID, DRIVER=4.21.29 MySQL将数据库+表的字符集和排序规则从UTF8更改为UTF8mb4 - MySQL change database + tables charset & collation from UTF8 to UTF8mb4 IBM DB2 值显示为 utf-8 文本 - IBM DB2 values displayed as utf-8 text 如何在查询级别将 DB2 二进制数据转换为 UTF-8 - How to convert DB2 binary data to UTF-8 at query level
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM