简体   繁体   English

SQL UNION ALL 但 BigQuery 上有很多列?

[英]SQL UNION ALL but with lots of columns on BigQuery?

背景收藏

Above image is a screenshot of my table just as a quick initial reference.上图是我的表格的屏幕截图,作为快速初始参考。

The focal point are the multiple mech columns (mech1, mech2, mech3, and mech4).焦点是多个机甲列(mech1、mech2、mech3 和 mech4)。

Board games in this tables have multiple attributes called mechanisms so I've separated them into 4 different columns.此表中的棋盘游戏有多个称为机制的属性,因此我将它们分为 4 个不同的列。

So I've learned how to combine columns vertically via UNION ALL so that I can query the count of all unique game mechanisms in my table.所以我学会了如何通过 UNION ALL 垂直组合列,这样我就可以查询我的表中所有独特游戏机制的计数。

However, it got me wondering if there's a shorter and more efficient way to achieve what I've done:然而,这让我想知道是否有更短、更有效的方法来实现我所做的:

WITH mechanism_info AS
        (
            WITH
                mechanism_col_combined AS
                    (
                        SELECT mech1 AS all_mech_columns_combined
                        FROM `ckda-portfolio-2022.bg_collection.base`
                        UNION ALL
                        ## There's no IS NOT NULL condition defined for column 'mech1' since there's at least one mechanism noted for a game.
                        SELECT mech2
                        FROM `ckda-portfolio-2022.bg_collection.base`
                        WHERE mech2 IS NOT NULL
                        UNION ALL
                        SELECT mech3
                        FROM `ckda-portfolio-2022.bg_collection.base`
                        WHERE mech3 IS NOT NULL
                        UNION ALL
                        SELECT mech4
                        FROM `ckda-portfolio-2022.bg_collection.base`
                        WHERE mech4 IS NOT NULL
                    )
                    ## Temporary table with all mechanism column in the collection combined.
            SELECT DISTINCT(all_mech_columns_combined) AS unique_mechanisms, COUNT(*) AS count
            FROM mechanism_col_combined
            GROUP BY all_mech_columns_combined
            ORDER BY all_mech_columns_combined
        )
SELECT *
FROM mechanism_info

By querying this temp.通过查询这个温度。 table, SQL returns the information that I've anticipated as below:表 SQL 返回我预期的信息如下:

unique_mechanisms | count
Acting            |   1
Action Points     |   3
Action Queue      |   1
Action Retrieval  |   1
Area Movement     |   1
Auction/Bidding   |   5
Bag Building      |   1
Betting & Bluffing|   2
Bingo             |   1
Bluffing          |   7

Now, I want to shorten my code and I know there has to be a way to shorten the repetitive process of combining columns with UNION ALL.现在,我想缩短我的代码,我知道必须有一种方法来缩短使用 UNION ALL 组合列的重复过程。

And if there's any other tips or methods on how to shorten my query, please let me know!如果还有关于如何缩短我的查询的任何其他提示或方法,请告诉我!

Thank you.谢谢你。

You can convert the multiple columns [mech1, mech2, ...] into a column of array mech_arr and then using UNNEST to convert the column to have scalar value in each row.您可以将多列[mech1, mech2, ...]转换为数组mech_arr的列,然后使用UNNEST将该列转换为在每一行中具有标量值。

For example:例如:

WITH table1 AS (
    SELECT 'AA' AS mech1, 'BB' AS mech2, 'CC' AS mech3,
    UNION ALL SELECT 'AA' AS mech1, 'CC' AS mech2, 'EE' AS mech3
),
table2 AS (SELECT [mech1, mech2, mech3] AS mech_arr FROM table1)

SELECT mech, COUNT(*) AS mech_counts
FROM table2, UNNEST(mech_arr) AS mech
GROUP BY mech

Output Output

mech    mech_counts
AA  2
BB  1
CC  2
EE  1

You could send join into the table, but the performance would not improve and the query would be just as long.您可以将连接发送到表中,但性能不会提高并且查询会一样长。
You can simplify as follows:您可以简化如下:

SELECT
  mech_column,
  count(*) "number"
FROM  (
       SELECT mech1 AS mech_column
       FROM `ckda-portfolio-2022.bg_collection.base`
         UNION ALL
       SELECT mech2
       FROM `ckda-portfolio-2022.bg_collection.base`
         UNION ALL
       SELECT mech3
       FROM `ckda-portfolio-2022.bg_collection.base`
         UNION ALL
       SELECT mech4
       FROM `ckda-portfolio-2022.bg_collection.base`
       ) m
WHERE mech_column IS NOT NULL
GROUP BY mech_column
ORDER BY mech_column;
       

Didn't find a smoother way to query but I did find a way to remove the process of adding WHERE column IS NOT NULL for each and every columns that was used to vertically aggregate them into a single column:没有找到更流畅的查询方式,但我确实找到了一种方法来删除为用于将它们垂直聚合到单个列中的每一列添加 WHERE column IS NOT NULL 的过程:

mechanism_info AS
    (
        WITH
            mechanism_col_combined AS
                (
                    SELECT mech1 AS mech_columns
                    FROM `ckda-portfolio-2022.bg_collection.base`
                    UNION ALL
                    SELECT mech2
                    FROM `ckda-portfolio-2022.bg_collection.base`
                    UNION ALL
                    SELECT mech3
                    FROM `ckda-portfolio-2022.bg_collection.base`
                    UNION ALL
                    SELECT mech4
                    FROM `ckda-portfolio-2022.bg_collection.base`
                    ## Removed all WHERE clause from the above columns 
                    and added it below instead.
                )
                ## Temporary table with all mechanism columns in the collection combined.
        SELECT DISTINCT(mech_columns) AS mechanisms, COUNT(*) AS count
        FROM mechanism_col_combined
        WHERE mech_columns IS NOT NULL ## <--- Added here!
        GROUP BY mech_columns
        ORDER BY mech_columns
    )

SELECT * FROM mechanism_info SELECT * 来自机制信息

Since mechanism_info is a nested temp.由于mechanism_info是一个嵌套的临时文件。 table, I can just add WHERE mech_columns IS NOT NULL clause and condition to the initial temp.表,我可以将WHERE mech_columns IS NOT NULL子句和条件添加到初始温度。 table's setting.表的设置。

I'm still looking to reduce this query down to something more efficient.我仍然希望将此查询减少到更有效的方式。 It's unfortunate that UNION ALL can't select multiple columns with a single call:(不幸的是, UNION ALL不能通过一次调用 select 多个列:(

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM