简体   繁体   中英

How to use cross_val_score with random_state

I get different values for different runs. What am I doing wrong here?

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score

X = np.random.random((100,5))
y = np.random.randint(0,2,(100,))
cross_val_score = RandomForestClassifier()
cv = StratifiedKFold(y, random_state=1)
s = cross_val_score(cross_val_score, X, y,scoring='roc_auc', cv=cv)
print(s)
# [ 0.42321429  0.44360902  0.34398496]

s = cross_val_score(cross_val_score, X, y, scoring='roc_auc', cv=cv)
print(s)
# [ 0.42678571  0.46804511  0.36090226]

The mistake you are making is calling the RandomForestClassifier whose default arg, random_state is None. So, it picks up the seed generated by np.random to produce the random output.

The random_state in both StratifiedKFold and RandomForestClassifier need to be the same inorder to produce equal arrays of scores of cross validation.

Illustration:

X=np.random.random((100,5))
y=np.random.randint(0,2,(100,))

clf = RandomForestClassifier(random_state=1)
cv = StratifiedKFold(y, random_state=1)        # Setting random_state is not necessary here
s = cross_val_score(clf, X,y,scoring='roc_auc', cv=cv)
print(s)
##[ 0.57612457  0.29044118  0.30514706]
print(s)
##[ 0.57612457  0.29044118  0.30514706]

Another way of countering it would be to not provide random_state args for both RFC and SKF. But, simply providing the np.random.seed(value) to create the random integers at the beginning. These would also create equal arrays at the output.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM