简体   繁体   English

如何确定 Firebird 数据库的字符集

[英]How to determine the character set of a Firebird database

I've read the following thread and I was able to make a conversion script (based on C#) that converts all my charset=NONE databases to charset=UTF8 and most of it works great (I still have a few special cases where characters are converted to weird symbols, but that's marginal).我已经阅读了以下线程,并且能够制作一个转换脚本(基于 C#),将我所有的charset=NONE数据库转换为charset=UTF8并且大部分效果很好(我仍然有一些特殊情况,其中字符是转换为奇怪的符号,但这是边缘)。

My issue is that I have lots of backup database files ( *.fbk ) for which I don't know for sure if this is UTF8 or NONE .我的问题是我有很多备份数据库文件 ( *.fbk ),我不确定这是UTF8还是NONE In the ideal world, my code would handle the conversion once the database is restored from file depending on the fbk file's format, so I only convert when necessary and after restore.在理想情况下,一旦从文件中恢复数据库,我的代码将根据fbk文件的格式处理转换,因此我只在必要时和恢复后进行转换。

Is this at all possible?这是可能吗? Or is there a way to define charset when restoring the database (either via gback of via ADO.NET provider)?或者有没有办法在恢复数据库时定义charset (通过 ADO.NET 提供程序的gback )?

In general, a Firebird database does not have a single character set.通常,Firebird 数据库没有单个字符集。 Each and every column can have its own character set.每一列都可以有自己的字符集。 So the only thing you can do is try and use heuristics.因此,您唯一能做的就是尝试使用启发式方法。

  1. Use the database default character set.使用数据库默认字符集。 To be clear, the database default character set is only used when creating a new column when no explicit character set is specified.需要明确的是,数据库默认字符集仅在未指定显式字符集的情况下创建新列时使用。 It is entirely possible for a database to have default character set UTF8, while all columns have character set WIN1251!数据库完全有可能使用默认字符集 UTF8,而所有列都使用字符集 WIN1251!

    You can find the database default character set with the following query:您可以使用以下查询找到数据库默认字符集:

     select RDB$CHARACTER_SET_NAME from RDB$DATABASE

    NOTE: If the result is NULL , then that means the default character set is NONE.注意:如果结果为NULL ,则表示默认字符集为 NONE。

  2. Count the different character sets of CHAR, VARCHAR and BLOB SUB_TYPE TEXT columns to see which occurs most:计算 CHAR、VARCHAR 和 BLOB SUB_TYPE TEXT 列的不同字符集,看看哪个出现最多:

     select coalesce(cs.RDB$CHARACTER_SET_NAME, 'NONE') as CHARSET, count(*) as CHARSET_COUNT from RDB$RELATIONS r inner join RDB$RELATION_FIELDS rf on rf.RDB$RELATION_NAME = r.RDB$RELATION_NAME inner join RDB$FIELDS f on f.RDB$FIELD_NAME = rf.RDB$FIELD_SOURCE left join RDB$CHARACTER_SETS cs on cs.RDB$CHARACTER_SET_ID = f.RDB$CHARACTER_SET_ID where coalesce(r.RDB$SYSTEM_FLAG, 0) = 0 and r.RDB$VIEW_BLR is null and (f.RDB$FIELD_TYPE in (14, 37) or f.RDB$FIELD_TYPE = 261 and f.RDB$FIELD_SUB_TYPE = 1) group by 1 order by 2 desc

As an aside, be aware that if clients have used connection character set NONE, then it is entirely possible that the actual character set of contents of a column may not match the defined character set of that column.顺便说一句,请注意,如果客户端使用了连接字符集 NONE,那么一列内容的实际字符集完全有可能与该列的定义字符集不匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM