[英]Efficient query to Group by column name in SQL or hive
想象一下,我有一个包含2列m_1和m_2的表:
m1 | m2 3 | 17 3 | 18 4 | 17 9 | 9
我想要一个包含3列的表格:
在示例中,结果是:
m | d | count m_1 | 3 | 2 m_1 | 4 | 1 m_1 | 9 | 1 m_2 | 17| 2 m_2 | 18| 1 m_2 | 9 | 1
第一个木质素被读取为“数据3在列m_1中出现2次”?
一个幼稚的解决方案是执行两次参数查询,如下所示:
for (i in 1 .. 2)
SELECT CONCAT('m_', i), m_i, count(*) FROM table GROUP BY m_i
但是此算法扫描了我的表两次。 这是一个问题,因为我有255列m和行bilion。
如果我使用蜂巢而不是关系数据库,解决方案会变得更容易吗?
您可以使用union all
和group by
编写此代码:
select colname, d, count(*)
from ((select 'm_1' as colname, m1 as d from t) union all
(select 'm_2' as colname, m2 as d from t)
) m12
group by colname, d;
posexplode(阵列(M1,M2))
select concat('m_',cast(pe.pos+1 as string)) as m
,pe.val as d
,count(*) as `count`
from mytable t
lateral view posexplode(array(m1,m2)) pe
group by pos
,val
;
+------+-----+--------+
| m | d | count |
+------+-----+--------+
| m_1 | 3 | 2 |
| m_1 | 4 | 1 |
| m_1 | 9 | 1 |
| m_2 | 9 | 1 |
| m_2 | 17 | 2 |
| m_2 | 18 | 1 |
+------+-----+--------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.