[英]SQL Hadoop Hive vs Toad for DB2 - Multiple Distinct Counts
I am trying to construct a query for Toad, however, the following is not working.我正在尝试为 Toad 构建查询,但是,以下内容不起作用。
select count (distinct t.column1, t.column2)
from schema.table
;
However, the above query works just fine in Hadoop Hive.但是,上面的查询在 Hadoop Hive 中工作得很好。 Any suggestions on refining the query so it works for Toad?关于改进查询以使其适用于 Toad 的任何建议?
Try to concatenate them:尝试连接它们:
select count(distinct concat(t.column1, t.column2))
from schema.table t
You can use a subquery:您可以使用子查询:
select count(1)
from (select distinct t.column1, t.column2 from schema.table) as t1
;
Emulating the behavior is a little tricky.模拟这种行为有点棘手。 The safest method is probably:最安全的方法可能是:
select sum(case when seqnum = 1 and column1 is not null and column2 is not null then 1 else 0 end)
from (select t.*,
row_number() over (partition by column1, column2 order by column1) as seqnum
from t
) t
(The order by
column doesn't matter. Many databases require one so I regularly include it.) ( order by
无关紧要。许多数据库都需要一个,所以我经常包括它。)
This version works for any database, not just DB2.此版本适用于任何数据库,而不仅仅是 DB2。
The issue is that Hive does not count a row if any of the values are NULL
, which this takes into account.问题是,如果任何值为NULL
,则 Hive 不会计算一行,这已考虑在内。
The use of select distinct
in a subquery is close, but it counts NULL
values -- and that change may not be appropriate for other columns in your query.在子查询中使用select distinct
很接近,但它会计算NULL
值——并且该更改可能不适用于查询中的其他列。
Concatenating columns together comes closer.将列连接在一起更接近。 However you have problems when there are overlapping values (say '12'/'3' and '1'/'23').但是,当存在重叠值(例如“12”/“3”和“1”/“23”)时,您会遇到问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.