Can I convert character encodings in an SQL Server stored proc?

Question

Due to "legacy" reasons a lot of our data is stored encoded in standard varchar columns along with the encoding that is used.

I'm working on a bulk upload routine in which I'd like to pass an xml string to a stored procedure (from C#). The xml string would all be in unicode with each element having an attribute indicating the desired target encoding (eg Shift-JIS for Japanese).

Is there some built-in mechanism in SQL Server for doing this kind of conversion in Transact-SQL?

Answer 1

Store the data as UNICODE. Store also the desired 'encoding' as another column. Return both the data (UNICODE) and the desired encoding in your application. Transform the Unicode data to the desired encoding in the presentation layer, where it belongs.

Answer 2

CAST(field AS varchar) [COLLATE][1] your collision

Answer 3

You should be able to accomplish this IF you simply extract the data from the XML using NVARCHAR as the destination datatype. If the Collation is specified properly on the column (and it kinda has to be in order for you to not already have data loss), then it should convert to the proper Code Page:

DECLARE @SourceXML XML = N'
<Test>
  <Row>
    <Something Collation="Hebrew_100_CI_AS">בליפ</Something>
  </Row>
  <Row>
    <Something Collation="Japanese_XJIS_100_CI_AS">如抜範浪偃壅國</Something>
  </Row>
</Test>'; -- the @Collation attribute is not necessary; only there for visual indication

DECLARE @Test TABLE
(
  HebrewCollation VARCHAR(20) COLLATE Hebrew_100_CI_AS,
  Latin1Collation VARCHAR(20) COLLATE Latin1_General_100_CI_AS,
  JapaneseCollation VARCHAR(20) COLLATE Japanese_XJIS_100_CI_AS
);

INSERT INTO @Test ([HebrewCollation], [Latin1Collation], [JapaneseCollation])
  SELECT tab.col.value('(./Something/text())[1]', 'NVARCHAR(100)'),
         tab.col.value('(./Something/text())[1]', 'NVARCHAR(100)'),
         tab.col.value('(./Something/text())[1]', 'NVARCHAR(100)')
  FROM   @SourceXML.nodes(N'/Test/Row') tab(col);

SELECT *,
       DATALENGTH([HebrewCollation]) AS [HebrewColumnBytes],
       DATALENGTH([JapaneseCollation]) AS [JapaneseColumnBytes]
FROM @Test;

Returns:

HebrewCollation  Latin1Collation  JapaneseCollation  HebrewColumnBytes  JapaneseColumnBytes
בליפ
                 ????             ????               4                   4
???????          ???????          如抜範浪偃壅國       7                  14

^{Result row 1 is on two lines due to a right-to-left vs left-to-right display issue caused by the werbeH ;-)}

The "HebrewColumnBytes" value of 4 for Row 1 is correct as the Hebrew_* Collations use Code Page 1255 which is a Single-Byte Character Set. Likewise, the "JapaneseColumnBytes" value of 14 for Row 2 is correct as the Japanese_* Collations use Code Page 932 which is a Double-Byte Character Set.

Can I convert character encodings in an SQL Server stored proc?

Question

3 answers

solution1
1 ACCPTED 2012-06-26 11:18:59

solution2
0 2012-06-26 11:03:13

solution3
0 2016-05-02 15:25:56

Can I convert character encodings in an SQL Server stored proc?

Question

3 answers

solution1 1 ACCPTED 2012-06-26 11:18:59

solution2 0 2012-06-26 11:03:13

solution3 0 2016-05-02 15:25:56

solution1
1 ACCPTED 2012-06-26 11:18:59

solution2
0 2012-06-26 11:03:13

solution3
0 2016-05-02 15:25:56