简体   繁体   English

SQL Count 在 GBQ 表中的不同行数

[英]SQL Count distinct number of rows in table in GBQ

I'd like to count the number of distinct rows in a table.我想计算表中不同行的数量。 I know that I can do that using groupby or by naming all the columns one by one, but would like to just do:我知道我可以使用 groupby 或通过一一命名所有列来做到这一点,但我只想这样做:

select count(distinct *) from my_table

Is that possible?那可能吗?

Do SELECT DISTINCT in a derived table (the subquery), then count the number of rows returned.派生表(子查询)中执行SELECT DISTINCT ,然后计算返回的行数。

select count(*) from
(select distinct * from my_table) dt

(Doesn't your table have any primary key?) (你的表没有任何主键吗?)

You can use to_json_string() :您可以使用to_json_string()

select count(distinct to_json_string(t))
from t;

Below more options for BigQuery Standard SQL下面是 BigQuery Standard SQL 的更多选项

select count(distinct format('%t', t))
from `project.dataset.table` t

depends on your use case - approximate count can be even more optimal option取决于您的用例 - 近似计数可能是更好的选择

select approx_count_distinct(format('%t', t))
from `project.dataset.table` t

APPROX_COUNT_DISTINCT - returns the approximate result for COUNT(DISTINCT expression). APPROX_COUNT_DISTINCT - 返回 COUNT(DISTINCT 表达式) 的近似结果。 The value returned is a statistical estimate—not necessarily the actual value.返回的值是统计估计值——不一定是实际值。 This function is less accurate than COUNT(DISTINCT expression), but performs better on huge input .此函数不如 COUNT(DISTINCT expression) 准确,但在大量输入上表现更好

The use of count(distinct *) is not permitted.不允许使用count(distinct *)

Alternatively you could explicitly name the columns (what defines uniqueness).或者,您可以明确命名列(定义唯一性的内容)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM