简体   繁体   中英

How to calculate average time when it is used TIME format in Bigquery?

I'm trying to get the AVG time, but the time format is not supported by the AVG function. I tried with CAST function, like in some posts were explained, but it seems doesn't work anyway. Thanks

WITH october_fall AS
   (SELECT
   start_station_name,
   end_station_name,
   start_station_id,
   end_station_id,
   EXTRACT (DATE FROM started_at) AS start_date,
   EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
   EXTRACT (TIME FROM started_at) AS start_time,    
   EXTRACT (DATE FROM ended_at) AS end_date,
   EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,    
   EXTRACT (TIME FROM ended_at) AS end_time,
   DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_lenght,
   member_casual
FROM 
   `ciclystic.cyclistic_seasonal_analysis.fall_202010` AS fall_analysis
ORDER BY 
   started_at DESC)
SELECT
   COUNT (start_week_date) AS avg_start_1,
   AVG (start_time) AS avg_start_time_1, ## here is where the problem start
   member_casual
FROM 
   october_fall
WHERE 
   start_week_date = 1
GROUP BY
   member_casual

Because BigQuery cannot calc AVG on TIME type, you would see the error message if you tried to do so.

Instead you could calc AVG by INT64.
The time_ts is timestamp format.
I tried to use time_diff to calc the differences from time to "00:00:00", then I could get the seconds in FLOAT64 format and cast it to INT64 format.
I create a function secondToTime . It's pretty straightforward to calc hour / minute / second and parse back to time format.

For the date format, I think you could do it in the same way.

create temp function secondToTime (seconds INT64)
    returns time 
    as (
        PARSE_TIME (
            "%H:%M:%S",
            concat(
                cast(seconds / 3600 as int),
                ":",
                cast(mod(seconds, 3600) / 60 as int),
                ":",
                mod(seconds, 60)
            )
        )
    );


with october_fall as (
    select
        extract (date from time_ts) as start_date,
        extract (time from time_ts) as start_time
    from `bigquery-public-data.hacker_news.comments`
    limit 10
) SELECT 
    avg(time_diff(start_time, time '00:00:00', second)),
    secondToTime(
        cast(avg(time_diff(start_time, time '00:00:00', second)) as INT64) 
    ),
    secondToTime(0),
    secondToTime(60),
    secondToTime(3601),
    secondToTime(7265)
FROM october_fall

Try below

SELECT
   COUNT (start_week_date) AS avg_start_1,
   TIME(
     EXTRACT(hour   FROM AVG(start_time - '0:0:0')), 
     EXTRACT(minute FROM AVG(start_time - '0:0:0')), 
     EXTRACT(second FROM AVG(start_time - '0:0:0'))
   ) as avg_start_time_1
   member_casual
FROM 
   october_fall
WHERE 
   start_week_date = 1
GROUP BY
   member_casual     

Another option would be

SELECT
   COUNT (start_week_date) AS avg_start_1,
   PARSE_TIME('0-0 0 %H:%M:%E*S', '' || AVG(start_time - '0:0:0')) as avg_start_time_1
   member_casual
FROM 
   october_fall
WHERE 
   start_week_date = 1
GROUP BY
   member_casual     

I know a few months have passed, but maybe someone else will be facing the same issue. As for the section where the problem occurred, something like this worked for me and gave the average ride_length:

FORMAT_TIMESTAMP
  ('%T', 
  TIMESTAMP_SECONDS(CAST(AVG(TIME_DIFF(ride_length, '00:00:00', SECOND)) AS 
  INT64)))
   AS avg_ride_length

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM