简体   繁体   中英

How do I limit the number of rows per field value in SQL?

For example, I have a table like this in Hive:

1 1
1 4
1 8
2 1
2 5
3 1
3 2

and I want to only return the first two rows of each unique value of the first column. I want this to be able to limit the amount of data that I transfer from Hive into MySQL for reporting purposes. I'd like a single HiveQL query that gives me this:

1 1
1 4
2 1
2 5
3 1
3 2

Unluckily mysql doesn't have Analytical Functions. So you have to play with variables. Supposing you have an autoincrement field:

mysql> create table mytab (
    -> id int not null auto_increment primary key,
    -> first_column int,
    -> second_column int
    -> ) engine = myisam;
Query OK, 0 rows affected (0.05 sec)

mysql> insert into mytab (first_column,second_column)
    -> values
    -> (1,1),(1,4),(2,10),(3,4),(1,4),(2,5),(1,6);
Query OK, 7 rows affected (0.00 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> select * from mytab order by id;
+----+--------------+---------------+
| id | first_column | second_column |
+----+--------------+---------------+
|  1 |            1 |             1 |
|  2 |            1 |             4 |
|  3 |            2 |            10 |
|  4 |            3 |             4 |
|  5 |            1 |             4 |
|  6 |            2 |             5 |
|  7 |            1 |             6 |
+----+--------------+---------------+
7 rows in set (0.00 sec)

mysql> select
    -> id,
    -> first_column,
    -> second_column,
    -> row_num
    -> from (
    -> select *,
    -> @num := if(@first_column = first_column, @num:= @num + 1, 1) as row_num,
    -> @first_column:=first_column as c
    -> from mytab order by first_column,id) as t,(select @first_column:='',@num:
=0) as r;
+----+--------------+---------------+---------+
| id | first_column | second_column | row_num |
+----+--------------+---------------+---------+
|  1 |            1 |             1 |       1 |
|  2 |            1 |             4 |       2 |
|  5 |            1 |             4 |       3 |
|  7 |            1 |             6 |       4 |
|  3 |            2 |            10 |       1 |
|  6 |            2 |             5 |       2 |
|  4 |            3 |             4 |       1 |
+----+--------------+---------------+---------+
7 rows in set (0.00 sec)

mysql> select
    -> id,
    -> first_column,
    -> second_column,
    -> row_num
    -> from (
    -> select *,
    -> @num := if(@first_column = first_column, @num:= @num + 1, 1) as row_num,
    -> @first_column:=first_column as c
    -> from mytab order by first_column,id) as t,(select @first_column:='',@num:
=0) as r
    -> having row_num<=2;
+----+--------------+---------------+---------+
| id | first_column | second_column | row_num |
+----+--------------+---------------+---------+
|  1 |            1 |             1 |       1 |
|  2 |            1 |             4 |       2 |
|  3 |            2 |            10 |       1 |
|  6 |            2 |             5 |       2 |
|  4 |            3 |             4 |       1 |
+----+--------------+---------------+---------+
5 rows in set (0.02 sec)

A Hive solution would be

SELECT S.col1, S.col2
FROM
 (SELECT col1, col2, row_number() over (partition by col1) as r FROM mytable) S
WHERE S.r < 3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM