繁体   English   中英

标准 SQL - 如何计算数组中值的频率

[英]Standard SQL - How to count frequency of values in array

我得到下表,下面是查询:

SQL 查询的结果 - 屏幕截图

SELECT 
  fullVisitorId,
  COUNT(fullVisitorId) as id_count,
  ARRAY_AGG(trafficSource.medium) AS trafic_medium
FROM 
  `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`
GROUP BY
  fullVisitorId
ORDER BY
  id_count DESC

对于trafic_medium列中的每个值(例如:cpc、referral、organic 等),我试图弄清楚每个值在数组中出现的频率,因此最好添加一个新列“count”来显示 ho 的频率值发生?

+-----------+---------+------+
| array_agg | medium  | count|
+-----------+---------+------+
| 123       | cpc     |   2  |
+-----------+---------+------+
|           | organic |   1  |
+-----------+---------+------+
|           | cpc     |   2  |
+-----------+---------+------+
| 456       | organic |   2  |
+-----------+---------+------+
|           | organic |   2  |
+-----------+---------+------+
|           | cpc     |   1  |
+-----------+---------+------+

我是 SQL 的新手,所以我很困惑。

到目前为止我试过这个:

WITH medium AS
(
    SELECT 
        fullVisitorId,
        COUNT(fullVisitorId) as id_count,
        ARRAY_AGG(trafficSource.medium) AS trafic_medium
    FROM 
        `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`
    GROUP BY
        fullVisitorId
    ORDER BY
        id_count DESC
) 
SELECT
    fullVisitorId,
    trafic_medium,
    (SELECT AS STRUCT Any_Value(trafic_medium) AS name, COUNT(*) AS count
FROM 
    UNNEST(trafic_medium) AS trafic_medium) AS trafic_medium_2,
FROM 
    medium

基于这个线程: How to count frequency of elements in a bigquery array field

但是,这仅显示并非所有不同的 'Any_Value 的数量。

我会很感激一些帮助!

ps 我在 BigQuery 的 'bigquery-public-dataset.google_analytics_sample' 上这样做

下面是 BigQuery Standard SQL,可帮助您入门

#standardSQL
SELECT id, trafic_medium,
  ARRAY(
    SELECT AS STRUCT medium, COUNT(1) `count`
    FROM t.trafic_medium medium
    GROUP BY medium
  ) stats
FROM `project.dataset.table` t

是否适用于您提出的样本/虚拟数据,如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 123 id, ['cpc', 'organic', 'cpc'] trafic_medium UNION ALL
  SELECT 456, ['organic', 'organic', 'cpc']
)
SELECT id, trafic_medium,
  ARRAY(
    SELECT AS STRUCT medium, COUNT(1) `count`
    FROM t.trafic_medium medium
    GROUP BY medium
  ) stats
FROM `project.dataset.table` t
-- ORDER BY id   

结果将是

在此处输入图像描述

作为一个选项 - 您可以使用以下版本

#standardSQL
SELECT id, 
  ARRAY(
    SELECT AS STRUCT medium, `count`
    FROM t.trafic_medium medium
    LEFT JOIN (
      SELECT AS STRUCT medium, COUNT(1) `count`
      FROM t.trafic_medium medium
      GROUP BY medium
    ) stats
    USING(medium) 
  ) trafic_medium  
FROM `project.dataset.table` t
-- ORDER BY id   

哪个(如果应用于相同的虚拟数据)将在下面输出

在此处输入图像描述

这个版本看起来更符合您的预期结果

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM