简体   繁体   English

SQL Server varchar与nvarchar的意外“排序依据”差异

[英]SQL Server unexpected 'order by' difference for varchar vs. nvarchar

I have encountered an unexpected difference between the results of 'order by' with varchar vs. nvarchar data. 我在varchar与nvarchar数据的“ order by”结果之间遇到了意外的差异。 In both cases the data in question is from the old ASCII character set; 在这两种情况下,所讨论的数据均来自旧的ASCII字符集。 the difference occurs in ordering data beginning with nnn vs -nnn, where n is a digit. 区别在于以nnn与-nnn开头的数据排序,其中n是一个数字。

Below is SQL Server script which reproduces the problem; 下面是重现该问题的SQL Server脚本; my test server is SQL 2016, but I have reproduced the problem in 2008 and 2012 as well. 我的测试服务器是SQL 2016,但我在2008年和2012年也重现了该问题。 I have tried different collations with no effect (except Latin1_General_bin, see below). 我尝试了不同的排序规则,但没有任何效果(Latin1_General_bin除外,请参见下文)。 The script creates 2 sample tables similar to one in our application, one using varchar and the other nvarchar, and adds 7 rows of data. 该脚本会创建2个示例表,与我们的应用程序中的一个相似,一个使用varchar,另一个使用nvarchar,并添加7行数据。

CREATE TABLE [dbo].[_ValidationLists](
    [_FldNum] [int] NOT NULL,
    [_ValidationEntry] [varchar](250) NOT NULL,
 CONSTRAINT [PK__ValidationLists] PRIMARY KEY CLUSTERED 
(
    [_ValidationEntry] ASC,
    [_FldNum] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[_ValidationListsN](
    [_FldNum] [int] NOT NULL,
    [_ValidationEntry] [nvarchar](250) NOT NULL,
 CONSTRAINT [PK__ValidationListsN] PRIMARY KEY CLUSTERED 
(
    [_ValidationEntry] ASC,
    [_FldNum] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

INSERT INTO [_ValidationLists] (_fldnum, _ValidationEntry) VALUES (1,'-1'), (1,'-10'), (1,'-100'), (1,'0'), (1,'1'), (1,'10'), (1,'100')
INSERT INTO [_ValidationListsN] (_fldnum, _ValidationEntry) VALUES (1,N'-1'), (1,N'-10'), (1,N'-100'), (1,N'0'), (1,N'1'), (1,N'10'), (1,N'100')


select * from [_ValidationLists]
order by [_ValidationEntry] asc
select * from [_ValidationListsN]
order by [_ValidationEntry] asc

The results of the select statements are below. select语句的结果如下。 The first result, for the varchar, is what I expect (lexicographic sort); 对于varchar,第一个结果就是我期望的结果(按字典顺序排序); the second results I cannot explain. 我无法解释的第二个结果。 The first is what our customer base also expects, and we were caught be surprise by this result. 首先是我们的客户群也期望的,这个结果令我们感到惊讶。 (The customer data is unusual - typically this table is used for alpha data; and alpha data orders identically for both varchar and nvarchar). (客户数据是不寻常的-通常此表用于alpha数据;并且varchar和nvarchar的alpha数据顺序相同)。

Results are identical using N'...' to initialize _ValidationListsN rows. 使用N'...'初始化_ValidationListsN行的结果是相同的。 Original data had longer entries, such as '-100:Pass'; 原始数据有更长的条目,例如'-100:Pass'; I have edited the data down to the least which demonstrate the problem. 我已将数据编辑到最小程度以证明问题所在。

Right-padding with blanks so all entries are same length has no effect. 右填充空白时,所有条目的长度均相同,无效。

Using COLLATE Latin1_General_bin reproduces lexicographic sorting, but is not acceptable because (for only one reason) we generally use case-insensitive sorts. 使用COLLATE Latin1_General_bin可重现字典排序,但不可接受,因为(仅出于一个原因)我们通常使用不区分大小写的排序。

Our customer which reported this problem has only ASCII data, so we can fix them by recreating this table using varchar. 报告此问题的客户只有ASCII数据,因此我们可以通过使用varchar重新创建此表来修复它们。 I would love to know why nvarchar behaves this way, since the results seem incorrect to me, and if there is a way to get the ordering behavior we expect (the first case). 我想知道为什么nvarchar会这样表现,因为对我来说结果似乎不正确,以及是否有一种方法可以实现我们期望的排序行为(第一种情况)。 At the least I have no idea why all entries which begin with '-' (ASCII 0X2d, dash or minus sign) do not order together. 至少我不知道为什么所有以“-”(ASCII 0X2d,破折号或减号)开头的条目都不在一起排序。

_FldNum _ValidationEntry _FldNum _ValidationEntry

1           -1  
1           -10  
1           -100  
1           0  
1           1  
1           10  
1           100  

(7 row(s) affected) (受影响的7行)

_FldNum _ValidationEntry _FldNum _ValidationEntry

1           0  
1           1  
1           -1  
1           10  
1           -10  
1           100  
1           -100  

(7 row(s) affected) (受影响的7行)

When SQL Server resolves ORDER BY , it decide the order of rows not only from collation, but also, as you already discovered, from data type. 当SQL Server解析ORDER BY ,它不仅根据排序规则来确定行的顺序,还可以根据数据类型来确定行的顺序。

Both varchar and nchar are at the end only binaries. 最后,varchar和nchar都是二进制文件。 The problem is, that in varchar, "-" sign comes after numbers/chars (when it comes to binary notation), for nvarchar, it is the opposite. 问题是,在varchar中,“-”符号在数字/字符之后(当涉及二进制表示法时),对于nvarchar,情况恰恰相反。 Check the internet for ASCI/UNICODE tables. 在Internet上查看ASCI / UNICODE表。

Because of that, since ORDER BY is in the end, comparing binary, your negative values comes first for nvarchar. 因此,由于ORDER BY是比较二进制文件,因此nvarchar的负值首先出现。

If you are more interested in how the data are actually stored, book "Microsoft SQL Server Internals" might be interesting for you. 如果您对数据的实际存储方式更感兴趣,那么“ Microsoft SQL Server内部”一书可能对您很有趣。 There is a whole section discussing this problem. 有整节讨论此问题。

EDIT: 编辑:

To see, how data are actually stored inside db, see this snippet: 要查看数据实际上是如何存储在db中的,请参见以下片段:

SELECT 
   [_ValidationEntry], 
   CONVERT(binary(6), [_ValidationEntry]) AS [BinaryRepresentation] 
FROM [_ValidationLists]
--ORDER BY [_ValidationEntry] COLLATE Latin1_General_bin

SELECT 
   [_ValidationEntry], 
   CONVERT(binary(6), [_ValidationEntry]) AS [BinaryRepresentation] 
FROM [_ValidationListsN]
--ORDER BY [_ValidationEntry] COLLATE Latin1_General_bin

Result of the query is like that: 查询结果如下:

0      0x300000000000
1      0x310000000000
-1     0x2D3100000000
10     0x313000000000
-10    0x2D3130000000
100    0x313030000000
-100   0x2D3130300000

When using BIN collation, column is ordered by its binary representation, therefore, 2D minus sign comes first. 使用BIN排序规则时,column按其二进制表示形式排序,因此, 2D负号在前。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM