简体   繁体   中英

How to generate sample data based of weighted base values

Using SQL, I need to generate some sample data of N rows based off some base values which have assigned weights.

For example: if my base values and their corresponding weights are:

  • a-1,
  • b-2,
  • c-3,
  • d-4,
  • e-5

And if the sample size required is 15, then the rowset returned should have:

  • 5 rows of e,
  • 4 rows of d,
  • 3 rows of c,
  • 2 rows of b.
  • and 1 row of a

for a total of 15 rows.

in sql server,you can use left,right to split the value then use cte + recursion get mutiple rows,and this logic is also common to other rdbms.

Table & DDL

|val|
|--- |
|a-1|
|b-2|
|c-3|
|d-4|
|e-5|

Query SQL

with cte as (
  select 
   left(val, CHARINDEX('-', val)-1)  id
   ,convert(int, right(val, CHARINDEX('-', val)-1 ))  cnt
  from t
)
,cte2 as (
  select T1.id,T1.cnt - 1 as cnt from cte T1
  union all
  select T1.id,T2.cnt - 1 as cnt from cte T1
  inner join cte2 T2 on T1.id = T2.id and T2.cnt > 0 
)
select id from cte2
order by id,cnt

online demo link | db<>fiddle

You need a way to generate the rows, such as a numbers table. Assuming you have that, the problem is then just arithmetic (basically).

The following works fine if the number of rows is an exact multiple of the sum of the weights:

select *
from (select t.*, sum(weight) over () as sum_weight,
             sum(weight) over (order by rand()) as running_weight
      from t
     ) t join
     n
     on n.n % sum_weight >= running_weight - weight and
        n.n % sum_weight < running_weight
where n.n <= 15
order by value;

Here is a db<>fiddle. The Fiddle uses SQL Server, but this is basically standard SQL.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM