简体   繁体   English

SELECT DISTINCT ... WHERE ...“随机”的结果顺序是什么?

[英]Is the order of the result of SELECT DISTINCT ... WHERE ... "random"?

I have an SQL query that reads我有一个 SQL 查询读取

SELECT DISTINCT [NR] AS K_ID 
FROM [DB].[staging].[TABLE]
WHERE [N]=1 and [O]='XXX' and [TYPE] in ('1_P', '2_I')

Since I'm saving the result in a CSV file (via Python Pandas) which is under version control I've noticed that the order of the result changes every time I run the query.由于我将结果保存在受版本控制的 CSV 文件(通过 Python Pandas)中,因此我注意到每次运行查询时结果的顺序都会发生变化。 In order to eliminate the Python part here I ran the query in MS SQL Server Management Studio, where I'm also observing a different order with every attempt.为了消除 Python 部分,我在 MS SQL Server Management Studio 中运行了查询,每次尝试时我都在观察不同的顺序。

It doesn't matter in my case, but: Is it correct, that the result of the query can be ordered differently with every execution?在我的情况下没关系,但是:是否正确,每次执行时查询结果的排序都不同? And if so, is there a way to make the order "deterministic"?如果是这样,有没有办法使订单“确定”?

SQL database are based on a relational algebra set theory concept, where what you think of as tables are more formally called unordered relations . SQL 数据库基于关系代数集理论概念,您所认为的表更正式地称为无序关系 Unless you specify an ORDER BY , the database is free to return the data is whatever order is convenient.除非您指定ORDER BY ,否则数据库可以按照方便的顺序自由返回数据。

This order might match an index, rather than the order on disk.此顺序可能匹配索引,而不是磁盘上的顺序。 It might also start in the middle of the data, if the database can take advantage of work already in progress for another query to reduce total reads between the two (Enterprise Edition will do this).它也可能从数据中间开始,如果数据库可以利用已经在进行的另一个查询的工作来减少两者之间的总读取(企业版将这样做)。

Worse, even the order on disk might change.更糟糕的是,甚至磁盘上的顺序也可能发生变化。 If there's no primary key, the database can even move a page around to help things run more efficiently.如果没有主键,数据库甚至可以移动页面以帮助事情更有效地运行。

In other words, if the order matters (and it usually does), specify an ORDER BY clause.换句话说,如果顺序很重要(而且通常很重要),请指定ORDER BY子句。

SQL queries return results as an unordered set, unless the outermost query has an order by . SQL 查询以无序集的形式返回结果,除非最外面的查询具有order by

On smaller amounts of data, the results look repeatable.在更少量的数据上,结果看起来是可重复的。 However, on larger systems -- and particularly on parallel systems -- the ordering may be based on hashing algorithms, when nodes complete, and congestion on the network (among other factors).然而,在更大的系统上——特别是在并行系统上——排序可能基于散列算法、节点何时完成以及网络上的拥塞(以及其他因素)。 So, you can in fact see different orderings each time you run.因此,您实际上可以在每次运行时看到不同的排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM