![](/img/trans.png)
[英]sql - for every unique value in column sample 2 different values from another column
[英]How to sample from different values in a column but only return records that are unique from another column?
我正在努力解决使用 Teradata 的采样问题
下面是数据的格式
ID Group Rank
1 dog 1
1 cat 1
1 lion 1
1 elephant 2
2 dog 1
2 cat 1
2 lion 1
2 elephant 1
3 dog 1
3 cat 2
3 lion 1
3 elephant 1
4 dog 2
4 cat 1
4 lion 1
4 elephant 1
...
理想情况下,我希望为 Group 中的每个条目返回一个样本编号,但只有 ID 中的唯一值。
下面是我生成的当前查询,但这会返回 ID 的重复项
SELECT ID, Group FROM Table
WHERE rank = 1
SAMPLE
WHEN group = 'dog' then 10
WHEN group = 'cat' then 10
WHEN group = 'elephant' then 5
WHEN group = 'lion' then 5
END
with cte as
(
SELECT ID, Group,
random(1,10000) as rnd -- RANDOM can't be directly used in OLAP-functions
FROM Table
WHERE rank = 1
)
SELECT ID, Group
FROM cte
QUALIFY
ROW_NUMBER() -- get one random row per ID
OVER (PARTITION BY ID
ORDER BY rnd) = 1
SAMPLE
WHEN group = 'dog' then 10
WHEN group = 'cat' then 10
WHEN group = 'elephant' then 5
WHEN group = 'lion' then 5
END
假设您有足够的记录,为每个 id 选择一个随机行,然后从中选择适当的数字:
select t.*
from (select t.*,
row_number() over (partition by group order by seqnum) as sequm_g
from (select t.*,
row_number() over (partition by id order by random(1, 1000000))
from t
) t
where seqnum = 1
) t
where (group in ('dog', 'cat') and seqnum_g <= 10) or
(group in ('elephant', 'lion') and seqnum_g <= 5) ;
这并不能保证这些组在结果集中足够大。 但是,如果您有足够的与组大小相关的数据,那么它应该可以工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.