I have a sample data frame read in using pandas. The data has two columns: 'item','label'. While I shuffle the df rows, I want to make sure the shuffled df does not have items that have the same consecutive labels. ie. this is acceptable, because the labels 'a','b', and 'c' are not in consecutive order:
1: fire, 'a'
2: smoke, 'b'
3: honey bee, 'a'
4: curtain, 'c'
but I want to avoid having such that the labels are in consecutive index, ie:
fire, 'a'
honey bee, 'a'
smoke, 'b'
curtain, 'c'
So far, I can shuffle using:
df = df.sample(frac=1).reset_index(drop=True)
I have a vague idea of looping over until df['label'][i+1] != df['label'][i]
, but not sure exactly how to. Any pointers or easier suggestion would be appreciated!
Thanks for the comments/pointers. I got it to work by:
randomized = False
while not randomized:
xlist = xlistbase.sample(frac=1).reset_index(drop=True) # where xlistbase is the original file read in
# check for repeats
for i in range(0, len(xlist)):
try:
if i == len(xlist) - 1:
randomized = True
elif xlist['label'][i] != xlist['label'][i+1]:
continue
elif xlist['label'][i] == xlist['label'][i+1]:
break
except IndexError:
pass
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.