简体   繁体   English

SQL Hadoop Hive vs Toad for ZC890515F1055143925AE4FB85B86EC0

[英]SQL Hadoop Hive vs Toad for DB2 - Multiple Distinct Counts

I am trying to construct a query for Toad, however, the following is not working.我正在尝试为 Toad 构建查询,但是,以下内容不起作用。

select count (distinct t.column1, t.column2)
from schema.table
;

However, the above query works just fine in Hadoop Hive.但是,上面的查询在 Hadoop Hive 中工作得很好。 Any suggestions on refining the query so it works for Toad?关于改进查询以使其适用于 Toad 的任何建议?

Try to concatenate them:尝试连接它们:

select count(distinct concat(t.column1, t.column2))
from schema.table t

You can use a subquery:您可以使用子查询:

select count(1)
from (select distinct t.column1, t.column2 from schema.table) as t1
;

Emulating the behavior is a little tricky.模拟这种行为有点棘手。 The safest method is probably:最安全的方法可能是:

select sum(case when seqnum = 1 and column1 is not null and column2 is not null then 1 else 0 end)
from (select t.*,
             row_number() over (partition by column1, column2 order by column1) as seqnum
      from t
     ) t

(The order by column doesn't matter. Many databases require one so I regularly include it.) order by无关紧要。许多数据库都需要一个,所以我经常包括它。)

This version works for any database, not just DB2.此版本适用于任何数据库,而不仅仅是 DB2。

The issue is that Hive does not count a row if any of the values are NULL , which this takes into account.问题是,如果任何值为NULL ,则 Hive 不会计算一行,这已考虑在内。

The use of select distinct in a subquery is close, but it counts NULL values -- and that change may not be appropriate for other columns in your query.在子查询中使用select distinct很接近,但它会计算NULL值——并且该更改可能不适用于查询中的其他列。

Concatenating columns together comes closer.将列连接在一起更接近。 However you have problems when there are overlapping values (say '12'/'3' and '1'/'23').但是,当存在重叠值(例如“12”/“3”和“1”/“23”)时,您会遇到问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM