[英]SQL Count distinct number of rows in table in GBQ
I'd like to count the number of distinct rows in a table.我想计算表中不同行的数量。 I know that I can do that using groupby or by naming all the columns one by one, but would like to just do:我知道我可以使用 groupby 或通过一一命名所有列来做到这一点,但我只想这样做:
select count(distinct *) from my_table
Is that possible?那可能吗?
Do SELECT DISTINCT
in a derived table (the subquery), then count the number of rows returned.在派生表(子查询)中执行SELECT DISTINCT
,然后计算返回的行数。
select count(*) from
(select distinct * from my_table) dt
(Doesn't your table have any primary key?) (你的表没有任何主键吗?)
You can use to_json_string()
:您可以使用to_json_string()
:
select count(distinct to_json_string(t))
from t;
Below more options for BigQuery Standard SQL下面是 BigQuery Standard SQL 的更多选项
select count(distinct format('%t', t))
from `project.dataset.table` t
depends on your use case - approximate count can be even more optimal option取决于您的用例 - 近似计数可能是更好的选择
select approx_count_distinct(format('%t', t))
from `project.dataset.table` t
APPROX_COUNT_DISTINCT - returns the approximate result for COUNT(DISTINCT expression). APPROX_COUNT_DISTINCT - 返回 COUNT(DISTINCT 表达式) 的近似结果。 The value returned is a statistical estimate—not necessarily the actual value.返回的值是统计估计值——不一定是实际值。 This function is less accurate than COUNT(DISTINCT expression), but performs better on huge input .此函数不如 COUNT(DISTINCT expression) 准确,但在大量输入上表现更好。
The use of count(distinct *)
is not permitted.不允许使用count(distinct *)
。
Alternatively you could explicitly name the columns (what defines uniqueness).或者,您可以明确命名列(定义唯一性的内容)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.