简体   繁体   English

如何根据用户 ID 和 ID 访问的级别数使用 Google BigQuery 查找中位数?

[英]How to find median using Google BigQuery based on both the user id and the number of levels the ID has visited?

I have about 100,000 user IDs who are visiting n number of levels.我有大约 100,000 个用户 ID,他们正在访问 n 个级别。 I need to find the overall median of the users and levels visited by each.我需要找到每个用户访问的用户和级别的总体中位数。

I've tried to use AVG based on the number of IDs on each level and the total IDs who started the application.我尝试根据每个级别的 ID 数量和启动应用程序的 ID 总数来使用 AVG。 The values are varying alot.值变化很大。

To find IDs who started the application.查找启动应用程序的 ID。

SELECT 
event_names, COUNT(DISTINCT id) uniques, COUNT(id) AS total 
FROM xyz.analytics_111.xyz 
WHERE (date BETWEEN "20191018" AND "20191024") AND version = "3.1" AND event_names in ("app_open","internet") AND platform = "ANDROID" 
AND id IN ( SELECT DISTINCT id FROM abc.analytics_111.abc WHERE event_names = "internet" AND internet_status = 1 ) 
GROUP BY event_names

To find total users on each level.查找每个级别的总用户数。

SELECT event_names, story_name, level, COUNT(DISTINCT id) uniques, COUNT(id) AS total 
FROM xyz.analytics_111.xyz WHERE (date BETWEEN "20191018" AND "20191024") AND version = "3.1" AND event_names in ("start_level","end_level") AND platform = "ANDROID" AND id IN ( SELECT DISTINCT id FROM abc.analytics_111.abc 
WHERE event_names = "internet" AND internet_status = 1 ) 
GROUP BY event_names, story_name, level ORDER BY event_names DESC, story_name, level

After this I'm dividing the Sum of User ID count on all levels by the number of User ID who started the application to get the AVG of Levels visited by each user.在此之后,我将所有级别的用户 ID 总数除以启动应用程序的用户 ID 数,以获得每个用户访问的级别的 AVG。 Is there a way to find a median?有没有办法找到中位数?

The question doesn't have enough details for a complete answer, but with the elements you've given us:该问题没有足够的详细信息来提供完整的答案,但是您提供了以下要素:

  • Don't use AVG , when you want MEDIAN当你想要MEDIAN时,不要使用AVG

To calculate a median you can do something like:要计算中位数,您可以执行以下操作:

SELECT level_id, fhoffa.x.median(ARRAY_AGG(some_number))
FROM `table`
GROUP BY level_id

bqutil.fn.median() is a public UDF we shared with the world: bqutil.fn.median()是我们与世界共享的公共 UDF:


Now, with the extra details you provided, if you want现在,如果您愿意,可以使用您提供的额外详细信息

to get the AVG of Levels visited by each user获取每个用户访问的级别的 AVG

, then: , 然后:

SELECT AVG(levels) avg_levels_for_users
FROM (
  SELECT id user, COUNT(DISTINCT level) levels
  FROM `....`
  GROUP BY user
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM