![](/img/trans.png)
[英]BigQuery IF condition then append value into Array - Standard SQL
[英]Standard SQL - How to count frequency of values in array
我得到下表,下面是查询:
SELECT
fullVisitorId,
COUNT(fullVisitorId) as id_count,
ARRAY_AGG(trafficSource.medium) AS trafic_medium
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`
GROUP BY
fullVisitorId
ORDER BY
id_count DESC
对于trafic_medium
列中的每个值(例如:cpc、referral、organic 等),我试图弄清楚每个值在数组中出现的频率,因此最好添加一个新列“count”来显示 ho 的频率值发生?
+-----------+---------+------+
| array_agg | medium | count|
+-----------+---------+------+
| 123 | cpc | 2 |
+-----------+---------+------+
| | organic | 1 |
+-----------+---------+------+
| | cpc | 2 |
+-----------+---------+------+
| 456 | organic | 2 |
+-----------+---------+------+
| | organic | 2 |
+-----------+---------+------+
| | cpc | 1 |
+-----------+---------+------+
我是 SQL 的新手,所以我很困惑。
到目前为止我试过这个:
WITH medium AS
(
SELECT
fullVisitorId,
COUNT(fullVisitorId) as id_count,
ARRAY_AGG(trafficSource.medium) AS trafic_medium
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`
GROUP BY
fullVisitorId
ORDER BY
id_count DESC
)
SELECT
fullVisitorId,
trafic_medium,
(SELECT AS STRUCT Any_Value(trafic_medium) AS name, COUNT(*) AS count
FROM
UNNEST(trafic_medium) AS trafic_medium) AS trafic_medium_2,
FROM
medium
基于这个线程: How to count frequency of elements in a bigquery array field
但是,这仅显示并非所有不同的 'Any_Value 的数量。
我会很感激一些帮助!
ps 我在 BigQuery 的 'bigquery-public-dataset.google_analytics_sample' 上这样做
下面是 BigQuery Standard SQL,可帮助您入门
#standardSQL
SELECT id, trafic_medium,
ARRAY(
SELECT AS STRUCT medium, COUNT(1) `count`
FROM t.trafic_medium medium
GROUP BY medium
) stats
FROM `project.dataset.table` t
是否适用于您提出的样本/虚拟数据,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 123 id, ['cpc', 'organic', 'cpc'] trafic_medium UNION ALL
SELECT 456, ['organic', 'organic', 'cpc']
)
SELECT id, trafic_medium,
ARRAY(
SELECT AS STRUCT medium, COUNT(1) `count`
FROM t.trafic_medium medium
GROUP BY medium
) stats
FROM `project.dataset.table` t
-- ORDER BY id
结果将是
作为一个选项 - 您可以使用以下版本
#standardSQL
SELECT id,
ARRAY(
SELECT AS STRUCT medium, `count`
FROM t.trafic_medium medium
LEFT JOIN (
SELECT AS STRUCT medium, COUNT(1) `count`
FROM t.trafic_medium medium
GROUP BY medium
) stats
USING(medium)
) trafic_medium
FROM `project.dataset.table` t
-- ORDER BY id
哪个(如果应用于相同的虚拟数据)将在下面输出
这个版本看起来更符合您的预期结果
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.