简体   繁体   English

在Java中设置组合算法

[英]Set Combinatorics Algorithm in Java

I have a data set with attributes like this: 我有一个包含如下属性的数据集:

Marital_status = {M,S,W,D}
IsBlind = {Y,N}
IsDisabled = {Y,N}
IsVetaran = {Y,N}

etc. There are about 200 such variables. 大约有200个这样的变量。

I need an algorithm to generate combinations of the attributes, with one value at a time. 我需要一个算法来生成属性的组合,一次只有一个值。

In other words, my first combination would be: 换句话说,我的第一个组合是:

Marital_status = M, IsBlind = Y, IsDisabled = Y, IsVeteran = Y

The next set would be: 下一组将是:

Marital_status = M, IsBlind = Y, IsDisabled = Y, IsVeteran = N

I tried to use a simple combination generator, treating each value for each attribute as an attribute itself. 我尝试使用一个简单的组合生成器,将每个属性的每个值视为属性本身。 It did not work because the mutually exclusive choices are included in the combinations and the number of possible combinations was really huge (133873417996074857185490633899939406700260683726864088366400 to be precise) 它不起作用,因为组合中包含互斥的选择,可能的组合数量非常巨大(准确地说是133873417996074857185490633899939406700260683726864088366400)

Could you please suggest an algorithm (preferably coded in Java)? 你能建议一个算法(最好用Java编码)吗?

Thanks!! 谢谢!!

Find another way. 找另一种方式。 If you have 200 variables, and each one has at least 2 choices, you're going to have >= 2^200 combinations. 如果您有200个变量,并且每个变量至少有2个选项,那么您将拥有> = 2 ^ 200个组合。 If you generate one combination each nanosecond, it would take about 10^43 years to enumerate 2^200 choices. 如果您每纳秒生成一个组合,则需要大约10 ^ 43年来枚举2 ^ 200个选项。

As others have pointed out (and yourself also), it is impossible to test exhaustively this. 正如其他人所指出的那样(也是你自己),不可能对此进行详尽的测试。

I suggest you take the sampling approach , and test with that. 我建议你采取抽样方法 ,然后进行测试。 You have strong theoretical background, so you will be able to find your way in the internet to find and understand this. 你有很强的理论背景,所以你将能够在互联网上找到并理解这一点。


But let me give a small example. 但是,让我举一个小例子。 For now, I will ignore possible "clusters" of parameters (that are strongly related). 现在,我将忽略可能的“集群”参数(与之密切相关)。

  • Create a sample of one data , containing all possible values for all your 200 parameters. 创建一个数据样本 ,包含所有200个参数的所有可能值。 This exhaustivity ensures that no parameter value could be forgotten. 这种穷举确保了不会忘记任何参数值。

    It doesn't have to be created upfront, the values can be created by a loop. 它不必预先创建,值可以通过循环创建。

  • To each sample of one data, you need to add the other values. 对于一个数据的每个样本,您需要添加其他值。 A simple approach would be to choose a number of times you want to test each one-sample (say N = 100). 一种简单的方法是选择您想要测试每个样本的次数(比如说N = 100)。 For each sample of one data, you would generate randomly N times the other values . 对于一个数据的每个样本,您将随机生成其他值的N倍

If there are 1000 possible values using all 200 parameters, and N=100, that would give us 100K tests. 如果使用所有200个参数有1000个可能的值,并且N = 100,那将给我们100K测试。


You could elaborate on this basic idea in many ways: 您可以通过多种方式详细阐述这一基本概念:

  • If you want your test to be repeatable , you could generate it only once, store it, and then reuse the same sets in all future tests. 如果您希望测试可重复 ,则只能生成一次,存储它,然后在将来的所有测试中重复使用相同的集合。
  • You could control your distribution so that each value gets selected a fair number of times . 您可以控制您的分配,以便每次选择相应的值
  • In real life, all 200 parameters wouldn't be without connections. 在现实生活中,所有200个参数都不会没有连接。 Many parameters would actually be connected to some others, in that the probability of finding the values together are not even. 许多参数实际上将与其他参数相关联,因为一起找到值的概率不均匀。 Instead of making the initial exhaustive set on only one parameter as I did previously, 而不是像我之前那样仅在一个参数上进行初始详尽设置,
    I would run the exhaustive set on a cluster of connected parameters . 我会在一组连接的参数上运行详尽的设置

As Keith pointed out, the number of combinations will be impossibly large if there are no excluded combinations, which would make your need unmeetable. 正如Keith指出的那样,如果没有排除的组合,组合的数量将是不可能的,这将使您的需求无法实现。 However, since you've already said that you have mutually exclusive choices, the solution space will be smaller. 但是,由于您已经说过您有相互排斥的选择,因此解决方案空间会更小。

How much smaller? 多小了? Depends on how many choices are mutually exclusive. 取决于有多少选择互相排斥。 I recommend doing some math on that before going too hard. 我建议在做之前先做一些数学计算。

Assuming that enough choices are exclusive, you're still going to have to essentially brute-force it, but you're very unlikely to find an existing, useful algorithm. 假设有足够的选择是独占的,你仍然必须基本上强制它,但你不太可能找到一个现有的,有用的算法。

Which brings me to the question: what's your reason for doing this - exhaustive testing? 这让我想到了这样一个问题:你这样做的原因是什么 - 详尽的测试? Sounds good, but you may find that that's not possible. 听起来不错,但你可能会发现那是不可能的。 I've encountered this issue myself, and in the end, you may well be forced to some combination of carefully selected "edge" cases, plus some quasi-randomly selected other cases. 我自己也遇到过这个问题,最后,你可能会被迫选择精心挑选的“边缘”案例,加上一些准随机选择的其他案例。

Having read your comment above, it appears you define "mutual exclusion" differently than I do, and I fear that you may have a problem. 阅读上面的评论后,您似乎以不同于我的方式定义“互斥”,我担心您可能会遇到问题。

So a given patient is not both blind and not blind. 因此,给定的患者既不是盲人也不是盲人。 Great. 大。 But that's not what I (and I suspect everyone else here) understood when you mentioned mutual exclusions. 但是当你提到相互排斥时,那不是我(我怀疑其他人在这里)的理解。

By those, I'm talking about eg, if blind, cannot be non-disabled, or something like that. 通过那些,我说的是,例如,如果失明,不能非残疾,或类似的东西。

Without a significant number of mutually exclusive inter-relationships between your attributes which limit their combinations, you will be unable to complete your exhaustive testing. 如果您的属性之间没有大量相互排斥的相互关系限制其组合,您将无法完成详尽的测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM