I get different values for different runs. What am I doing wrong here?
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score
X = np.random.random((100,5))
y = np.random.randint(0,2,(100,))
cross_val_score = RandomForestClassifier()
cv = StratifiedKFold(y, random_state=1)
s = cross_val_score(cross_val_score, X, y,scoring='roc_auc', cv=cv)
print(s)
# [ 0.42321429 0.44360902 0.34398496]
s = cross_val_score(cross_val_score, X, y, scoring='roc_auc', cv=cv)
print(s)
# [ 0.42678571 0.46804511 0.36090226]
The mistake you are making is calling the RandomForestClassifier
whose default arg, random_state
is None. So, it picks up the seed generated by np.random
to produce the random output.
The random_state
in both StratifiedKFold
and RandomForestClassifier
need to be the same inorder to produce equal arrays of scores of cross validation.
Illustration:
X=np.random.random((100,5))
y=np.random.randint(0,2,(100,))
clf = RandomForestClassifier(random_state=1)
cv = StratifiedKFold(y, random_state=1) # Setting random_state is not necessary here
s = cross_val_score(clf, X,y,scoring='roc_auc', cv=cv)
print(s)
##[ 0.57612457 0.29044118 0.30514706]
print(s)
##[ 0.57612457 0.29044118 0.30514706]
Another way of countering it would be to not provide random_state
args for both RFC and SKF. But, simply providing the np.random.seed(value)
to create the random integers at the beginning. These would also create equal arrays at the output.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.