I am trying to create a classifier model for a dataset, but I have too many distinct values for my target variable. If I run something like this:
Create or replace model `model_name`
options (model_type="AUTOML_CLASSIFIER", input_label_cols=["ORIGIN_AIRPORT"]) as
select DAY_OF_WEEK, ARRIVAL_TIME, ARRIVAL_DELAY, ORIGIN_AIRPORT
from `table_name`
limit 1000
I end up getting
Error running query
Classification model currently only supports classification with up to 50 unique labels and the label column had 111 unique labels.
So how can I select, for example, all rows that have one of the first 50 values of ORIGIN_AIRPORT
?
Given a table of values (val), with unique identifiers (id), find the minimum id (mid) for each unique value (val)
Return all rows which match the first 3 (densely ranked, by min id (mid)) vals.
The test data:
+------+----+
| val | id |
+------+----+
| 1 | 1 |
| 1 | 2 |
| 8 | 3 |
| 8 | 4 |
| 8 | 5 |
| 7 | 6 |
| 7 | 7 |
| 6 | 8 |
| 5 | 9 |
| 4 | 10 |
| 3 | 11 |
| 3 | 12 |
| 7 | 13 |
| 7 | 14 |
| 1 | 15 |
| 8 | 16 |
| 3 | 17 |
| 1 | 18 |
+------+----+
The solution:
WITH min_ids (val, id, mid) AS (
SELECT val
, id
, MIN(id) OVER (PARTITION BY val) AS mid -- min id per val
FROM vals
)
, ranks (val, id, mid, r) AS (
SELECT val
, id
, mid
, DENSE_RANK() OVER (ORDER BY mid) AS r -- Densely ranked minimum ids
FROM min_ids
)
SELECT *
FROM ranks
WHERE r <= 3 -- Return rows matching r <= 3 (vals = 1, 8, and 7)
ORDER BY r, id
;
The final result:
+------+----+------+---+
| val | id | mid | r |
+------+----+------+---+
| 1 | 1 | 1 | 1 |
| 1 | 2 | 1 | 1 |
| 1 | 15 | 1 | 1 |
| 1 | 18 | 1 | 1 |
| 8 | 3 | 3 | 2 |
| 8 | 4 | 3 | 2 |
| 8 | 5 | 3 | 2 |
| 8 | 16 | 3 | 2 |
| 7 | 6 | 6 | 3 |
| 7 | 7 | 6 | 3 |
| 7 | 13 | 6 | 3 |
| 7 | 14 | 6 | 3 |
+------+----+------+---+
There are a number of solutions. We could have obtained a distinct list of vals and LIMIT the number of rows returned, then join with that list of vals.
Since your original question was to obtain rows matching the first N values, I used that more strict logic.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.