简体   繁体   中英

BigQuery unexpected keyword

I'm just getting started with Google BigQuery, and have run into issues with my very first query. I'm trying to get a list of Stack Overflow posts since and including 2015-01-01 which have one of several tags. Below is my first pass at the query:

#standardSQL

SELECT
 title,
 body,
 answer_count,
 creation_date,
 tags,
 view_count 

FROM
 `bigquery-public-data.stackoverflow.posts_questions` limit 10

WHERE
 creation_date >= "2015-01-01" AND tags HAVING "terraform" OR "chef" OR "puppet" OR "ansible"

The BigQuery validator is showing the following error message:

Error : Syntax error: Unexpected keyword WHERE at [14:1]

You have a few syntax errors, namely the limit 10 in the wrong place, and using the HAVING keyword incorrectly. I'd also use native timestamp instead of comparing strings:

#standardSQL
SELECT
  title,
  body,
  answer_count,
  creation_date,
  tags,
  view_count
FROM
  `bigquery-public-data.stackoverflow.posts_questions`
WHERE
  creation_date >= TIMESTAMP('2015-01-01')
  AND tags IN ('terraform',
    'chef',
    'puppet',
    'ansible')
LIMIT
  10

There are a few issues here, but hopefully this will help:

  • LIMIT needs to come last, after the WHERE clause.
  • While HAVING is valid in some contexts--namely as a post-aggregation (ie after GROUP BY) filter, it doesn't have the meaning that I think you want here.

With that said this query might be what you want:

#standardSQL
SELECT
  title,
  body,
  answer_count,
  creation_date,
  tags,
  view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE creation_date >= "2015-01-01" AND
  EXISTS (
    SELECT 1 FROM UNNEST(SPLIT(tags, "|")) AS tag
    WHERE tag IN ("terraform", "chef", "puppet", "ansible")
  )
LIMIT 10;

Note that I needed to use SPLIT with the tags column because the tags are separated by the pipe character. Since you get a terabyte of querying for free, try to make the most of it by getting all the results at once rather than using the LIMIT, too.

SELECT
usertype, CONCAT(start_station_name, " to ",end_station_name) AS route, COUNT(*) as num_trips, ROUND(AVG(cast(tripduration as int64)/60),2) AS duration FROM bigquery-public-data.new_york_citibike.citibike_trips GROUP BY start_station_name, end_station_name,usertype, ORDER BY num_trips DESC LIMIT 10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM