简体   繁体   English

我可以在SQL Server存储过程中转换字符编码吗?

[英]Can I convert character encodings in an SQL Server stored proc?

Due to "legacy" reasons a lot of our data is stored encoded in standard varchar columns along with the encoding that is used. 由于“传统”原因,我们的许多数据都与使用的编码一起存储在标准varchar列中。

I'm working on a bulk upload routine in which I'd like to pass an xml string to a stored procedure (from C#). 我正在研究一个批量上载例程,在该例程中,我想将xml字符串传递给存储过程(来自C#)。 The xml string would all be in unicode with each element having an attribute indicating the desired target encoding (eg Shift-JIS for Japanese). xml字符串将全部为unicode,每个元素都具有指示所需目标编码的属性(例如,日语的Shift-JIS)。

Is there some built-in mechanism in SQL Server for doing this kind of conversion in Transact-SQL? SQL Server中是否有一些内置机制可以在Transact-SQL中进行这种转换?

Store the data as UNICODE. 将数据存储为UNICODE。 Store also the desired 'encoding' as another column. 也将所需的“编码”存储为另一列。 Return both the data (UNICODE) and the desired encoding in your application. 在您的应用程序中返回数据(UNICODE)和所需的编码。 Transform the Unicode data to the desired encoding in the presentation layer, where it belongs. 在表示层所属的表示层中将Unicode数据转换为所需的编码。

CAST(field AS varchar) [COLLATE][1] your collision

You should be able to accomplish this IF you simply extract the data from the XML using NVARCHAR as the destination datatype. 如果仅使用NVARCHAR作为目标数据类型从XML中提取数据,就应该能够完成此任务。 If the Collation is specified properly on the column (and it kinda has to be in order for you to not already have data loss), then it should convert to the proper Code Page: 如果在列上正确指定了排序规则(并且一定是为了使您没有数据丢失),则它应该转换为正确的代码页:

DECLARE @SourceXML XML = N'
<Test>
  <Row>
    <Something Collation="Hebrew_100_CI_AS">בליפ</Something>
  </Row>
  <Row>
    <Something Collation="Japanese_XJIS_100_CI_AS">如抜範浪偃壅國</Something>
  </Row>
</Test>'; -- the @Collation attribute is not necessary; only there for visual indication

DECLARE @Test TABLE
(
  HebrewCollation VARCHAR(20) COLLATE Hebrew_100_CI_AS,
  Latin1Collation VARCHAR(20) COLLATE Latin1_General_100_CI_AS,
  JapaneseCollation VARCHAR(20) COLLATE Japanese_XJIS_100_CI_AS
);

INSERT INTO @Test ([HebrewCollation], [Latin1Collation], [JapaneseCollation])
  SELECT tab.col.value('(./Something/text())[1]', 'NVARCHAR(100)'),
         tab.col.value('(./Something/text())[1]', 'NVARCHAR(100)'),
         tab.col.value('(./Something/text())[1]', 'NVARCHAR(100)')
  FROM   @SourceXML.nodes(N'/Test/Row') tab(col);

SELECT *,
       DATALENGTH([HebrewCollation]) AS [HebrewColumnBytes],
       DATALENGTH([JapaneseCollation]) AS [JapaneseColumnBytes]
FROM @Test;

Returns: 返回值:

HebrewCollation  Latin1Collation  JapaneseCollation  HebrewColumnBytes  JapaneseColumnBytes
בליפ
                 ????             ????               4                   4
???????          ???????          如抜範浪偃壅國       7                  14

Result row 1 is on two lines due to a right-to-left vs left-to-right display issue caused by the werbeH ;-) 由于werbeH ;-)导致从右到左与从左到右的显示问题,结果行1位于两行

The "HebrewColumnBytes" value of 4 for Row 1 is correct as the Hebrew_* Collations use Code Page 1255 which is a Single-Byte Character Set. 行1的“ HebrewColumnBytes”值为4是正确的,因为Hebrew_ *归类使用代码页1255,它是一个单字节字符集。 Likewise, the "JapaneseColumnBytes" value of 14 for Row 2 is correct as the Japanese_* Collations use Code Page 932 which is a Double-Byte Character Set. 同样,第2行的“ JapaneseColumnBytes”值为14是正确的,因为Japanese_ *归类使用的代码页932是双字节字符集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM