繁体   English   中英

Google BigQuery SQL语句

[英]Google BigQuery SQL Statement

我正在尝试使用Google Big Query从GitHub存档中获取一些数据。 我正在请求的当前数据量太大,BigQuery无法处理(至少在免费层中),因此我试图限制请求的范围。

我想限制数据,以便只有当前拥有超过1000个星的存储库才能返回历史数据。 这比仅说“ repository_watchers> 1000”更为复杂,因为它将排除存储库获得的前1000个星星的历史数据。

SELECT repository_name, repository_owner, created_at, type, repository_url, repository_watchers
FROM [githubarchive:github.timeline]
WHERE type="WatchEvent"
ORDER BY created_at DESC

编辑:我使用的解决方案(基于@Brian的答案)

select y.repository_name, y.repository_owner, y.created_at, y.type, y.repository_url, y.repository_watchers
  from [githubarchive:github.timeline] y
  join (select repository_url, max(repository_watchers)
          from [githubarchive:github.timeline] x
         where x.type = 'WatchEvent'
         group by repository_url
        having max(repository_watchers) > 1000) x
    on y.repository_url = x.repository_url
  where y.type = 'WatchEvent'
 order by y.repository_name, y.repository_owner, y.created_at desc

尝试:

select y.*
  from [githubarchive :github.timeline] y
  join (select repository_name, max(repository_watchers)
          from [githubarchive :github.timeline]
         where x.type = 'WatchEvent'
         group by repository_name
        having max(repository_watchers) > 1000) x
    on y.repository_name = x.repository_name
 order by y.created_at desc

如果不支持该语法,则可以使用以下三步解决方案:

步骤1:找出哪些REPOSITORY_NAME值至少有一条记录,且REPOSITORY_WATCHERS的数量> 1000

select repository_name, max(repository_watchers) as curr_watchers
  from [githubarchive :github.timeline]
 where type = 'WatchEvent'
 group by repository_name
having max(repository_watchers) > 1000

步骤2:将结果存储为表格,将其命名为SUB

步骤3:对SUB(和您的原始表)运行以下命令

select y.*
  from [githubarchive :github.timeline] y
  join sub x
    on y.repository_name = x.repository_name
 order by y.created_at desc

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM