[英]Efficient query to Group by column name in SQL or hive
想象一下,我有一個包含2列m_1和m_2的表:
m1 | m2 3 | 17 3 | 18 4 | 17 9 | 9
我想要一個包含3列的表格:
在示例中,結果是:
m | d | count m_1 | 3 | 2 m_1 | 4 | 1 m_1 | 9 | 1 m_2 | 17| 2 m_2 | 18| 1 m_2 | 9 | 1
第一個木質素被讀取為“數據3在列m_1中出現2次”?
一個幼稚的解決方案是執行兩次參數查詢,如下所示:
for (i in 1 .. 2)
SELECT CONCAT('m_', i), m_i, count(*) FROM table GROUP BY m_i
但是此算法掃描了我的表兩次。 這是一個問題,因為我有255列m和行bilion。
如果我使用蜂巢而不是關系數據庫,解決方案會變得更容易嗎?
您可以使用union all
和group by
編寫此代碼:
select colname, d, count(*)
from ((select 'm_1' as colname, m1 as d from t) union all
(select 'm_2' as colname, m2 as d from t)
) m12
group by colname, d;
posexplode(陣列(M1,M2))
select concat('m_',cast(pe.pos+1 as string)) as m
,pe.val as d
,count(*) as `count`
from mytable t
lateral view posexplode(array(m1,m2)) pe
group by pos
,val
;
+------+-----+--------+
| m | d | count |
+------+-----+--------+
| m_1 | 3 | 2 |
| m_1 | 4 | 1 |
| m_1 | 9 | 1 |
| m_2 | 9 | 1 |
| m_2 | 17 | 2 |
| m_2 | 18 | 1 |
+------+-----+--------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.