简体   繁体   中英

How I do I get a second sample from a dataset in Python without getting duplication from a first sample?

I have a python dataset that I have managed to take a sample from and put in a second dataset. After that I will need to produce another sample from the original dataset but I do not want any of the first sample to come up again. Ideally this would need any flag would only be there for a year so it can then be sampled again after that time has elapsed.

Denote your original dataset with A. You generate a subset of A, denote it with B1. You can then create B2 from A_leftover = A \ B1, where \ denotes the set difference. You can then generate B3, B4, ... B12 from A_leftover, where Bi is generated from A_leftover = B(i-1).

If you want to put back B1 in the next year, A_leftover = A_leftover \ B12 U B1, and from this, you can generate the subset for B13 (or you can denote it with B1 as 13%12 = 1). So after 12, you can say you can generate Bi from A_leftover = A_leftover \ B(i-1) UB(i-11). Or you can use this formula from the very beginning, defining B(-i) = empty set for every i in [0,1,2,...,10].

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM