简体   繁体   中英

Create a user defined function that works with GROUP BY in mysql

I'm trying to create an aggregated function MEDIAN() in MySQL like MIN(), MAX(), AVG() which takes the input the column name or string that has concatenated values of the desired column.

I'm having trouble understanding the limitations of MySQL custom functions & would be really helpful if some can help me find out how this is done.

Example:

MySQL table has 2 columns (ID, num)

+----+-----+
| id | num |
+----+-----+
|  1 |   5 |
|  1 |   6 |
|  1 |   7 |
|  2 |   1 |
|  2 |   3 |
|  2 |   5 |
+----+-----+

SELECT id, MEDIAN(num) as median
FROM table
GROUP BY id;

OR

SELECT id, MEDIAN(GROUP_CONCAT(num SEPARATOR ',') as median
FROM table
GROUP BY id;

Expected Output is

+----+--------+
| id | median |
+----+--------+
|  1 |      6 |
|  2 |      3 |
+----+--------+

User defined aggregate stored functions were added in MariaDB-10.3.3

MySQL can do aggregate functions however not in SQL. They need a UDF (shared library implemenation)

It is possible to do without a UDF, and I know of two ways to do it. The first uses two selects and a join, the first select to get the values and rankings, and the second select to get the counts, then joins them. The second uses json functions to get everything in one select. They are both a little lengthy, but they work and are reasonably fast.

SOLUTION #1 (two selects and a join, one to get counts, one to get rankings)

SELECT  x.group_field, 
        avg(
            if( 
                x.rank - y.vol/2 BETWEEN 0 AND 1, 
                value_field, 
                null
            )
        ) as median
FROM (
    SELECT  group_field, value_field, 
            @r:= IF(@current=group_field, @r+1, 1) as rank, 
            @current:=group_field
    FROM (
        SELECT group_field, value_field
        FROM table_name
        ORDER BY group_field, value_field
    ) z, (SELECT @r:=0, @current:='') v
) x, (
    SELECT group_field, count(*) as vol 
    FROM table_name
    GROUP BY group_field
) y WHERE x.group_field = y.group_field
GROUP BY x.group_field;

SOLUTION #2 (uses a json object to store the counts and avoids the join)

SELECT group_field, 
    avg(
        if(
            rank - json_extract(@vols, path)/2 BETWEEN 0 AND 1,
            value_field,
            null
        )
    ) as median
FROM (
    SELECT group_field, value_field, path, 
        @rnk := if(@curr = group_field, @rnk+1, 1) as rank,
        @vols := json_set(
            @vols, 
            path, 
            coalesce(json_extract(@vols, path), 0) + 1
        ) as vols,
        @curr := group_field
    FROM (
        SELECT p.group_field, p.value_field, concat('$.', p.group_field) as path
        FROM table_name
        JOIN (SELECT @curr:='', @rnk:=1, @vols:=json_object()) v
        ORDER BY group_field, value_field DESC
    ) z
) y GROUP BY group_field;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM