简体   繁体   中英

Google BigQuery (LegacySQL) GROUP BY

I was doing a hello world on Google BigQuery, was supposed to be able to execute this (in Cloud Shell) and get some results returned but instead, I got an Error about GROUP BY list.

But Google Docs is telling me otherwise at this that my LegacySQL in question is correct.

=====

$bq query --use_legacy_sql=true "SELECT REGEXP_REPLACE(title,r'_', ' ') AS regexp_title, views FROM (SELECT * FROM [bigquery-samples:wik
ipedia_benchmark.Wiki100M] WHERE NOT title CONTAINS ':' AND wikimedia_project='wp' AND language='en' AND REGEXP_MATCH(title, r'^G.*o.*o.*e$') GROUP BY title ORDER BY views DESC LIMIT 10)"

Waiting on bqjob_r47d6732dcb76803b_00000163cfb22bdc_1 ... (0s) Current status: DONE

Error in query string: Error processing job 'ordinal-throne-172104:bqjob_r47d6732dcb76803b_00000163cfb22bdc_1': Expression 'year' is not present in the GROUP BY list

=====

Please, could you dear expert help me shed light on this?

Thank you. Will

This is actually not an issue with BigQuery, and instead an issue with SQL, as the query you are trying to run is not correct, and it would fail in any SQL-driven environment. At first sight, I see several issues:

  1. You use a GROUP BY statement, but you are not grouping anything. GROUP BY is often used with aggregate functions (such as COUNT , MAX , MIN , SUM or AVG ) to group result sets, but you are not doing that in your query.
  2. GROUP BY statements should include all the fields that you are going to retrieve. In your nested query, you are querying for all field ( * ), but you are not grouping by year , for instance, so that is what BQ is complaining about: Expression 'year' is not present in the GROUP BY list .
  3. You are performing a nested SELECT query. If you are only interested in the views and title fields, why not directly query only for them, and your query will use less resources (as it will only be searching those two columns)?

Therefore, I think a query like the one below will better fit what I understand you are trying to do. Feel free to modify the aggregation function to the one of your choice :

SELECT
  REGEXP_REPLACE(title,r'_', ' ') AS regexp_title,
  SUM(views) as sum_views
FROM
  [bigquery-samples:wikipedia_benchmark.Wiki100M]
WHERE
  NOT title CONTAINS ':'
  AND wikimedia_project='wp'
  AND language='en'
  AND REGEXP_MATCH(title, r'^G.*o.*o.*e$')
GROUP BY
  regexp_title
ORDER BY
  sum_views DESC
LIMIT
  10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM