[英]how to do nested SQL select count
i'm querying a system that won't allow using DISTINCT , so my alternative is to do a GROUP BY to get near to a result 我正在查询不允许使用DISTINCT的系统 ,所以我的替代方法是执行GROUP BY以接近结果
my desired query was meant to look like this, 我想要的查询应该看起来像这样,
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(DISTINCT(column3)) AS column3
FROM table
for the alternative, i would think i'd need some type of nested query along the lines of this, 作为替代方案,我认为我需要遵循这种方式的某种嵌套查询,
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(SELECT column FROM table GROUP BY column) AS column3
FROM table
but it didn't work. 但这没用。 Am i close?
我靠近吗?
You are using the wrong syntax for COUNT(DISTINCT)
. 您使用的
COUNT(DISTINCT)
语法错误。 The DISTINCT
part is a keyword, not a function. DISTINCT
部分是关键字,而不是函数。 Based on the docs , this ought to work: 根据文档 ,这应该可以工作:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(DISTINCT column3) AS column3
FROM table
Do, however, read the docs. 但是,请阅读文档。 BigQuery's implementation of
COUNT(DISTINCT)
is a bit unusual, apparently so as to scale better for big data. BigQuery的
COUNT(DISTINCT)
实现有点不寻常,显然是为了更好地扩展大数据。 If you are trying to count a large number of distinct values then you may need to specify a second parameter ( and you have an inherent scaling problem). 如果要计算大量不同的值,则可能需要指定第二个参数( 并且存在固有的缩放问题)。
Update : 更新 :
If you have a large number of distinct column3
values to count, and you want an exact count, then perhaps you can perform a join instead of putting a subquery in the select list (which BigQuery seems not to permit): 如果您要计数大量不同的
column3
值,并且想要精确计数,那么也许可以执行联接,而不是将子查询放在选择列表中(BigQuery似乎不允许这样做):
SELECT *
FROM (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2
FROM table
)
CROSS JOIN (
SELECT count(*) AS column3
FROM (
SELECT column3
FROM table
GROUP BY column3
)
)
Update 2 : 更新2 :
Not that joining two one-row tables would be at all expensive, but @FelipeHoffa got me thinking more about this, and I realized I had missed a simpler solution: 并不是说连接两个单行表会很昂贵,但是@FelipeHoffa让我更加思考了这一点,并且我意识到我错过了一个更简单的解决方案:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(*) AS column3
FROM (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2
FROM table
GROUP BY column3
)
This one computes a subtotal of column1
and column2
values, grouping by column3
, then counts and totals all the subtotal rows. 这将计算
column1
和column2
值的小计,并按column3
分组,然后对所有小计行进行计数和总计。 It feels right. 感觉不错。
FWIW, the way you are trying to use DISTINCT
isn't how its normally used, as its meant to show unique rows, not unique values for one column in a dataset. FWIW,您尝试使用
DISTINCT
方式不是通常使用的方式,因为它的意思是显示唯一的行,而不是数据集中一列的唯一值。 GROUP BY
is more in line with what I believe you are ultimately trying to accomplish. GROUP BY
更符合我认为您最终要实现的目标。
Depending upon what you need you could do one of a couple things. 根据您的需要,您可以执行以下两项操作之一。 Using your second query, you would need to modify your subquery to get a count, not the actual values, like:
使用第二个查询,您需要修改子查询以获取计数,而不是实际值,例如:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
(SELECT sum(1) FROM table GROUP BY column) AS column3
FROM table
Alternatively, you could do a query off your initial query, something like this: 或者,您可以从初始查询中进行查询,如下所示:
SELECT sum(column1), sum(column2), sum(column4) from (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
1 AS column4
FROM table GROUP BY column3)
GROUP BY column4
Edit: The above is generic SQL, not too familiar with Google Big Query 编辑:上面是通用SQL,不太熟悉Google Big Query
You can probably use a CTE 您可能可以使用CTE
WITH result as (select column from table group by column)
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
Select Count(*) From result AS column3
FROM table
Instead of doing a COUNT(DISTINCT), you can get the same results by running a GROUP BY first, and then counting results. 不用执行COUNT(DISTINCT),您可以通过先运行GROUP BY,然后对结果计数来获得相同的结果。
For example, the number of different words that Shakespeare used by year: 例如,莎士比亚按年份使用的不同单词数:
SELECT corpus_date, COUNT(word) different_words
FROM (
SELECT word, corpus_date
FROM [publicdata:samples.shakespeare]
GROUP BY word, corpus_date
)
GROUP BY corpus_date
ORDER BY corpus_date
As a bonus, let's add a column that identifies which books were written during each year: 作为奖励,让我们添加一列来标识每年写的书:
SELECT corpus_date, COUNT(word) different_words, GROUP_CONCAT(UNIQUE(corpus)) books
FROM (
SELECT word, corpus_date, UNIQUE(corpus) corpus
FROM [publicdata:samples.shakespeare]
GROUP BY word, corpus_date
)
GROUP BY corpus_date
ORDER BY corpus_date
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.