简体   繁体   中英

Sampling without replacement from an unknown set

I have to extract uniformly DFAs (deterministic finite automata) from a set of DFA that I call . 一组DFA中统一提取DFA(确定性有限自动机)。 It seems a simple problem, but I haven't the set . contains all the DFA of dimension , thus I know dimension of , I can build but I can't because is very large. 包含维度所有DFA,因此我知道维度,我可以构建但由于非常大而不能。 I know also the dimension of sets Sm where for example S3 is a subset of S and S3 contains all the DFA with 3 states, Sm contains all the DFA with m states where m < n .

I have not the set S and thus I have to simulate a uniformly sampling. Furthermore I have to do sampling without replacement. I create a set D={1,2,3........n} and for each value, that I call i , in D I associate the value |Si|/|S| where | | indicates the number of elements in the set that is the argument. Namely I created a distribution. Now I can extract a value from D according to this distribution. In this way I found the set from which extract a single DFA. For example if from DI extract 4 then I have to extract uniformly a DFA from S4 .

But my question is, how can I sample a DFA from Si ( S4 in the example above) without replacement? Namely if I have already extracted previously a specific DFA, in the next sampling I must avoid that specific DFA. Remark that a DFA is a matrix, a table (a bidimensional array). Remark also that extract a specific DFA means uniformly extract for each cell of the above mentioned table a value in {1,.....,k} where k is the number of the elements of the alphabet (you have to extract also for each state if is accepting or rejecting).

(I have to implement in C++11 but this is pretty irrelevant)

If i understand your problem correctly, the trivial solution would be to keep every sampled DFA, and upon generating a new random one - check if it has been generated before. I guess your problem is the large amount of memory required to store them all.

If so, you may keep only the hash of each DFA - eg a 128-bit MurmurHash3, and compare the hash of newly generated DFAs with the stored hashes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM