简体   繁体   中英

Hive - explode and aggregate count?

I have a hive table that stores a drive's path to the leaf node, and how many files it has. And want to build a new hive table that can look up how many files exists under any folder depth. Can I get some tips on how to achieve this?

My Input table is:

>select * from default.file_file
path file_count
/teamA/projectA/ 3
/teamA/projectB/ 2

I'm trying to build the output table so that it can look like:

>select * from default.output_table
path file_count
/ 5
/teamA/ 5
/teamA/projectA/ 3
/teamA/projectB/ 2

So Far I tried the lateral_view + explode method, but that doesn't allow me to keep track of the directory hierarchy (each / is stored in separate rows).

You can split path, then use running collect_set analytic function + concat_ws to build path again. After this aggregate file_count group by path:

Demo:

SELECT stack (2,
'/teamA/projectA/', 3,
'/teamA/projectB/', 2
) AS  (path, file_count)
)
 
select path, sum(file_count) file_count
from
(
select  t.file_count, 
        concat(concat_ws('/',collect_set(s.node) over(partition by t.path order by level))
               , '/'
              ) path
  from Table1 t
       --remove trailing /, split and explode 
       lateral view posexplode(split(regexp_replace(t.path,'/$',''),'/')) s as level, node
)s
group by path;

Result:

path                file_count
/                   5
/teamA/             5
/teamA/projectA/    3
/teamA/projectB/    2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM