简体   繁体   English

如何“限制”在 SQL 中获取的数据量

[英]How to “LIMIT” the amount of data fetched in SQL

I'm querying multiple tables joined together我正在查询连接在一起的多个表

SELECT a.column, b.column, c.column 
FROM t1 AS a, t2 AS b, t3 AS c
WHERE a.column = b.column AND a.column = b.column AND a.column = b.column

Is there any way to limit the amount of data scanned, so it doesn't query the entire dataset?有没有办法限制扫描的数据量,所以它不会查询整个数据集? Note there isn't the possibility of filtering by date/time or some other condition.请注意,不可能按日期/时间或其他条件进行过滤。

I know that if you put LIMIT 100 it can still query the entire results set.我知道如果你设置 LIMIT 100 它仍然可以查询整个结果集。 Is there a way to literally just query a random set of 100 rows and return them (cutting down on query time and workload )有没有办法从字面上查询一组随机的 100 行并返回它们(减少查询时间和工作量)

In databases that support LIMIT , the LIMIT applies to the result set not to the data being scanned .在支持LIMIT的数据库中, LIMIT适用于结果集而不适用于正在扫描的数据 The SQL optimizer is free to choose whatever execution plan it wants. SQL 优化器可以自由选择它想要的任何执行计划。 It can take the LIMIT into account, so the query is optimized for "time to first row" rather than "time to last row".它可以将LIMIT考虑在内,因此查询针对“第一行时间”而不是“最后一行时间”进行了优化。

Next, you should be using proper, explicit, standard , readable JOIN syntax.接下来,您应该使用正确、明确、标准、可读的JOIN语法。 If you want to limit the amount of data read, then you can put limit in a subquery :如果要限制读取的数据量,可以在子查询中设置限制:

SELECT a.column, b.column, c.column 
FROM (SELECT t1.*
      FROM t1 
      LIMIT 100
     ) a JOIN
     t2 b
     ON a.column = b.column JOIN
     t3 c
     ON c.column = b.column  -- or whatever;

Note: This is not guaranteed to return 100 rows, but it should limit the scanning of t1 -- which may or may not be relevant.注意:这不能保证返回 100 行,但它应该限制t1的扫描——这可能是相关的,也可能是不相关的。 Also, one of your original tags was BigQuery and merely limiting the number of rows scanned has no impact on performance (as opposed to pruning partitions).此外,您的原始标签之一是 BigQuery,仅限制扫描的行数对性能没有影响(与修剪分区相反)。

I should also note that LIMIT is usually used with ORDER BY , so the result set is stable .我还应该注意LIMIT通常与ORDER BY一起使用,因此结果集是稳定的。 That is, ORDER BY would make it consistent from one run to the next -- rather than returning an indeterminate (but not random) 100 rows.也就是说, ORDER BY将使其从一次运行到下一次运行保持一致——而不是返回不确定的(但不是随机的)100 行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM