I'm trying to resample my sample data to calculate bootstrap standard error. But the results don't match the designated probabilities I made.
for 'p' in numpy.random.choice(a, size=None, replace=True, p=None), I assinged a list of probabilities which is
[0.190872103, 0.120820803, 0.115160092, 0.008137272, 0.029541836, 0.0, 0.535467893, 0.0] for ['neutral', 'happy', 'sad', 'surprise', 'fear', 'disgust', 'anger','contempt'] each.
data = pd.read_csv(path+'shawshank_FER_entropy.csv', encoding = 'utf-8', delimiter='\t')
emo_list = ['neutral', 'happy', 'sad', 'surprise', 'fear', 'disgust', 'anger','contempt']
pb = data.andy
p = [float(pb.iloc[11]),float(pb.iloc[12]),float(pb.iloc[13]),float(pb.iloc[14]),float(pb.iloc[15]),float(pb.iloc[16]),float(pb.iloc[17]),float(pb.iloc[18])]
print(p)
emo_sample = np.random.choice(emo_list, 1000, p)
print(emo_sample)
unique, counts = np.unique(emo_sample, return_counts=True)
print(np.asarray((unique, counts)).T)
I expected results to be 1000 emotion words distributed as the probability I designated, but the results are uniformly distributed as below.
[['anger' '128'] ['contempt' '140'] ['disgust' '101'] ['fear' '134'] ['happy' '121'] ['neutral' '120'] ['sad' '123'] ['surprise' '133']]
Can you explain why my codes don't use the probability I specified?
The call signature of numpy.random.choice is:
numpy.random.choice(a, size=None, replace=True, p=None)
Notice that p
is the 4th parameter, not the 3rd. So emo_sample = np.random.choice(emo_list, 1000, p)
is assigning p
to the replace
parameter instead of the p
parameter:
numpy.random.choice(a, size=None, replace=p, p=None)
One way to fix this is to use keyword parameters:
emo_sample = np.random.choice(emo_list, 1000, p=p)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.