I'm currently trying to randomize a list of 0s and 1s which should give a random order of zeros and ones with the following constraints:
1/3 of the items have to be 1s (respectively 2/3 are 0s)
No more than two 1s should occur consecutively
I have worked on an option, but it did not exactly turn out to be what I need. Here's my option:
for prevItem, nextItem in enumerate(WordV[: -1]):
if nextItem == WordV[prevItem+1] and WordV[prevItem+1] == WordV[prevItem+2] and nextItem ==1:
WordV[prevItem+2] = 0
if nextItem == WordV[prevItem+1] and WordV[prevItem+1] == WordV[prevItem+2] and WordV[prevItem+2] == WordV[prevItem+3] and WordV[prevItem+3] == WordV[prevItem+4] and nextItem == 0:
WordV[prevItem+2] = 1
# Check the number of ones & zeros
print(WordV)
ones= WordV.count(1)
zeros= WordV.count(0)
print(ones, zeros)
Currently, the number of ones and zeros does not add up to a proportion of 1/3 to 2/3 because the constraints replace numbers. The WordV list is a list containing 24 ones and 48 zeros that is shuffled randomly (with random.shuffle(WordV)).
Is there a smarter (and more correct) way to integrate the constraints into the code?
import numpy as np
def consecutive(data, stepsize=0):
return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)
def check(list_to_check):
groups = consecutive(list_to_check)
for group in groups:
if group[0] == 1 and group.size > 2:
return True
if group[0] == 0 and group.size > 4:
return True
wordv = np.array([1]*24+[0]*48)
while check(wordv):
np.random.shuffle(wordv)
wordv will contain something like:
array([0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0,
0, 0, 1, 0, 1, 0])
The consecutive function will split the data in groups containing the same element:
[ins] In [32]: consecutive([1,1,1,0,0,1])
Out[32]: [array([1, 1, 1]), array([0, 0]), array([1])]
The check will check both conditions you specified and we will shuffle the list until we meet the conditions
You could try an optimization approach: Start with the list holding the elements in the right proportion, then keep swapping random elements until you get the desired results. In each turn, check the number of too-long streaks of 0s or 1s and always keep the better one of the original or the mutated list.
import itertools, random
def penalty(lst):
return sum(1 for k, g in itertools.groupby(lst)
if k == 0 and len(list(g)) > 4 or k == 1 and len(list(g)) > 2)
def constrained_shuffle(lst):
# penalty of original list
p = penalty(lst)
while p > 0:
# randomly swap two elements, get new penalty
a, b = random.randrange(len(lst)), random.randrange(len(lst))
lst[a], lst[b] = lst[b], lst[a]
p2 = penalty(lst)
if p2 > p:
# worse than before, swap back
lst[a], lst[b] = lst[b], lst[a]
else:
p = p2
lst = [0] * 20 + [1] * 10
random.shuffle(lst)
constrained_shuffle(lst)
print(lst)
For 200 0s and 100 1s this will take a few hundred to a few thousand iterations until it finds a valid list, which is okay. For lists with thousands of elements this is rather too slow, but could probably be improved by memorizing the positions of the too-long streaks and preferrably swapping elements within those.
About the "randomness" of the approach: Of course, it is less random than just repeatedly generating a new shuffled list until one fits the constraints, but I don't see how this will create a bias for or against certain lists, as long as those satisfy the constraints. I did a short test, repeatedly generating shuffled lists and counting how often each variant appears:
counts = collections.Counter()
for _ in range(10000):
lst = [0] * 10 + [1] * 5
random.shuffle(lst)
constrained_shuffle(lst)
counts[tuple(lst)] += 1
print(collections.Counter(counts.values()).most_common())
[(7, 197), (6, 168), (8, 158), (9, 157), (5, 150), (10, 98), (4, 92),
(11, 81), (12, 49), (3, 49), (13, 43), (14, 23), (2, 20), (15, 10),
(1, 8), (16, 4), (17, 3), (18, 1)]
So, yes, maybe there are a few lists that are more likely than others (one appeared 18 times, three 17 times, and most others 5-9 times). For 100,000 iterations, the "more likely" lists appear ~50% more often than the others, but still only about 120 times out of those 100,000 iterations, so I'd think that this is not too much of a problem.
Without the initial random.shuffle(lst)
there are more lists what appear much more often than the average, so this should not be skipped.
I don't really know python, so I'll give you pseudocode:
int length;
int[] onesAndZeros = new int[length];
for(int i: onesAndZeros) { // generate a random list
i = random(0, 1);
}
int zeroCount() { // correct the ratio
int c;
for(int i: onesAndZeros) {
if(i == 0) {
c++;
}
}
return c;
}
int wantedZeros;
if(zeroCount() / (length - zeroCount()) != 2) { // you should probably check a small interval, but this answer is already long
int a = 2*(length - zeroCount()) - zeroCount(); // I will include the math if necessary
wantedZeros = zeroCount() + a;
}
while(zeroCount() != wantedZeros) {
boolean isLess = zeroCount < wantedZeros;
if(isLess) {
onesAndZeros[random(0, length - 1)] = 0;
} else {
onesAndZeros[random(0, length - 1)] = 0;
}
}
string isCorrect() { // fix the 2 1s and 4 0s
for(int i = 0; i < length; i++) {
if(onesAndZeros[i] == 0 &&
onesAndZeros[i + 1] == 0 &&
onesAndZeros[i + 2] == 0 &&
onesAndZeros[i + 3] == 0 &&
onesAndZeros[i + 4] == 0) { // be sure not to go out of bounds!
return "0" + i;
} else
if(onesAndZeros[i] == 1 &&
onesAndZeros[i + 1] == 1 &&
onesAndZeros[i + 2] == 1) {
return "1" + i;
} else {
return "a";
}
}
}
void fix(int type, int idx) {
if(type == 0) {
onesAndZeros[idx + 4] = 1;
} else {
onesAndZeros[idx + 2] = 0;
}
}
string corr = isCorrect();
while(length(corr) >= 2) { // note: this step will screw up the ones/zeros ratio a bit, if you want to restore it, consider running the last 2 steps again
if(corr[0] == '0') {
fix(0, toInt(removeFirstChar(corr)));
} else {
fix(1, toInt(removeFirstChar(corr)));
}
}
// done!
I'm well aware that this can be greatly optimized and cleaned up, depending on the language. But this is more of a solid base to build upon.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.