简体   繁体   English

如何根据值将记录平均分为N组?

[英]How can I evenly divide records into N groups based on the values?

For a table as follows, how can I divide these records evenly into 3 groups based on the value of “factor_value”?对于如下表,如何根据“factor_value”的值将这些记录平均分为3组?

 sym date factor_value ------ ---------- ------------ 100000 2022.04.27 1 100001 2022.04.27 2 100002 2022.04.27 3 100003 2022.04.27 4 100004 2022.04.27 5 100005 2022.04.27 6 100006 2022.04.27 7 100007 2022.04.27 8 100008 2022.04.27 9 100009 2022.04.27 10 100010 2022.04.28 100000 2022.04.28 100001 2022.04.28 100002 2022.04.28 3 100003 2022.04.28 4 100004 2022.04.28 5 100005 2022.04.28 6 100006 2022.04.28 7 100007 2022.04.28 8 100008 2022.04.28 9

This can be implemented by DolphinDB functions cutPoints and asof .这可以通过 DolphinDB 函数cutPointsasof来实现。

sym=take(string(100000..100010),20)
date=sort(take(2022.04.27..2022.04.28,20))
factor_value= 1..10 join take(int(),3) join 3..9
tb= table( sym, date, factor_value)
select *,asof(cutPoints(int(factor_value*100000),3),factor_value*100000)+1 as factor_quantile from tb context by date  csort  factor_value having  size(distinct(factor_value*100000))>3 

First, use contexy by with csort to sort the column factor_value.首先,使用contexy by with csort对列 factor_value 进行排序。 Then allocate the records into 3 groups evenly with cutPoints .然后使用cutPoints将记录平均分配到 3 组。 asof returns the grouping number for each element in the group. asof返回组中每个元素的分组编号。

output: output:

sym    date       factor_value factor_quantile
------ ---------- ------------ ---------------
100000 2022.04.27 1            1              
100001 2022.04.27 2            1              
100002 2022.04.27 3            1              
100003 2022.04.27 4            1              
100004 2022.04.27 5            2              
100005 2022.04.27 6            2              
100006 2022.04.27 7            2              
100007 2022.04.27 8            3              
100008 2022.04.27 9            3              
100009 2022.04.27 10           3              
100010 2022.04.28              1              
100000 2022.04.28              1              
100001 2022.04.28              1              
100002 2022.04.28 3            1              
100003 2022.04.28 4            2              
100004 2022.04.28 5            2              
100005 2022.04.28 6            2              
100006 2022.04.28 7            3              
100007 2022.04.28 8            3              
100008 2022.04.28 9            3   
 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM