I currently have a table that contains an organization name, org_name
, and an array of structs called types
that contains structs with properties name
and count
. I am attempting to use bigquery to figure out the "dominant" type
by finding the type with the highest count
and appending that types name
to the organizations row. My code to get the organization name and the array of structs is as follows:
CREATE TEMP FUNCTION GetNamesAndCounts(elements ARRAY<STRING>) AS (
ARRAY(
SELECT AS STRUCT elem AS name, COUNT(*) AS count
FROM UNNEST(elements) AS elem
GROUP BY elem
ORDER BY count
)
);
select org_name, GetNamesAndCounts(types_of_professionals) as types from table
This is a picture of the results from that query. For context, I would like there to be another column dominant type
that displays the name of the type
with the highest count
.
As I can see, you order your structs ascending by column count
, so the last element is what you want (if there is no other name
with same count
). So you can just GetNamesAndCounts(types_of_professionals)[ordinal(array_length(GetNamesAndCounts(types_of_professionals)))]
or here is full script
select
org_name,
array(
select as struct
elem as name,
count(*) as count
from
unnest(elements) as elem
group by
elem
order by
count
) as types,
( select
elem
from
unnest(elements) as elem
group by
elem
order by
count(*) desc
limit
1
) as dominant_type,
array(
select
elem
from
( select
elem,
count(*) as count,
rank() over(partition by
elem
order by
count(*) desc) as rank
from
unnest(elements) as elem
group by
elem
)
where
rank = 1
) as all_dominant_types
from
table
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.