简体   繁体   中英

Random Variable Selection in SQL

I have seen a few questions like this, but nothing has answered what I'm looking for.

I have 5,000 rows of data from over 3 years. Every line has a memberID, so memberIDs repeat and are only unique to an individual (but they will repeat in the column if the individual is in the system multiple times over 3 years).

How can I pull 100 random memberIDs over the course of 3 years? (So theoretically there would be more than 100 lines because memberIDs can repeat)

EDIT: I should clarify, Member ID is character, not numeric. Ex: W4564

NOTE: This is NOT looking for n rows, rather 100 different IDs over the course of 3 years, so an ID might be associated with 3 rows in the result. The result will have a differing number of rows each time the SQL is run.

Depending on how your data is indexed, you could simply grab the rows with the memberID from a subquery. For example:

SELECT *
FROM <yourtable>
WHERE memberID IN (SELECT DISTINCT TOP 100 memberID FROM <yourtable>)

That should return random memberIDs, depending on your index. If you need to force it, you can do like in the linked question in the comments, and sort it randomly:

SELECT *
FROM <yourtable>
WHERE memberID IN (SELECT DISTINCT TOP 100 memberID FROM <yourtable> ORDER BY newid())

Using order by newid() you can use a random sorting. Using where exists you can isolate only those members for which data exists in the past three years. You need to do that in this stage, otherwise you might accidentally end up with only members that don't have any recent data at all. By adding top 100 you can select just 100 rows out of the set.

The combination should get the 100 random member ids for which data exists in the past three years:

select top 100
  m.MemberID
from
  Member m
where
  exists (select 'x' 
          from MemberData d 
          where d.MemberId = m.MemberId
                and d.DataDate > dateadd(year, -3, getdate()))
order by 
  newid()

Then you could use that query in an in clause to get data from the same MemberData table, or any other table for that matter:

select
  md.*
from
  MemberData md
where
  -- Same filter to get only the recent data
  md.DataDate > dateadd(year, -3, getdate()) and
  -- Only of 100 random members that have been active in the past 3 years.
  md.MemberId in (
    select top 100
      m.MemberID
    from
      Member m
    where
      exists (select 'x' 
              from MemberData d 
              where d.MemberId = m.MemberId
                    and d.DataDate > dateadd(year, -3, getdate()))
    order by 
      newid()
  )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM