[英]How can I evenly divide records into N groups based on the values?
For a table as follows, how can I divide these records evenly into 3 groups based on the value of “factor_value”?对于如下表,如何根据“factor_value”的值将这些记录平均分为3组?
sym date factor_value ------ ---------- ------------ 100000 2022.04.27 1 100001 2022.04.27 2 100002 2022.04.27 3 100003 2022.04.27 4 100004 2022.04.27 5 100005 2022.04.27 6 100006 2022.04.27 7 100007 2022.04.27 8 100008 2022.04.27 9 100009 2022.04.27 10 100010 2022.04.28 100000 2022.04.28 100001 2022.04.28 100002 2022.04.28 3 100003 2022.04.28 4 100004 2022.04.28 5 100005 2022.04.28 6 100006 2022.04.28 7 100007 2022.04.28 8 100008 2022.04.28 9
This can be implemented by DolphinDB functions cutPoints
and asof
.这可以通过 DolphinDB 函数cutPoints
和asof
来实现。
sym=take(string(100000..100010),20)
date=sort(take(2022.04.27..2022.04.28,20))
factor_value= 1..10 join take(int(),3) join 3..9
tb= table( sym, date, factor_value)
select *,asof(cutPoints(int(factor_value*100000),3),factor_value*100000)+1 as factor_quantile from tb context by date csort factor_value having size(distinct(factor_value*100000))>3
First, use contexy by
with csort
to sort the column factor_value.首先,使用contexy by
with csort
对列 factor_value 进行排序。 Then allocate the records into 3 groups evenly with cutPoints
.然后使用cutPoints
将记录平均分配到 3 组。 asof
returns the grouping number for each element in the group. asof
返回组中每个元素的分组编号。
output: output:
sym date factor_value factor_quantile
------ ---------- ------------ ---------------
100000 2022.04.27 1 1
100001 2022.04.27 2 1
100002 2022.04.27 3 1
100003 2022.04.27 4 1
100004 2022.04.27 5 2
100005 2022.04.27 6 2
100006 2022.04.27 7 2
100007 2022.04.27 8 3
100008 2022.04.27 9 3
100009 2022.04.27 10 3
100010 2022.04.28 1
100000 2022.04.28 1
100001 2022.04.28 1
100002 2022.04.28 3 1
100003 2022.04.28 4 2
100004 2022.04.28 5 2
100005 2022.04.28 6 2
100006 2022.04.28 7 3
100007 2022.04.28 8 3
100008 2022.04.28 9 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.