How to use cross_val_score with random_state

Question

I get different values for different runs. What am I doing wrong here?

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score

X = np.random.random((100,5))
y = np.random.randint(0,2,(100,))
cross_val_score = RandomForestClassifier()
cv = StratifiedKFold(y, random_state=1)
s = cross_val_score(cross_val_score, X, y,scoring='roc_auc', cv=cv)
print(s)
# [ 0.42321429  0.44360902  0.34398496]

s = cross_val_score(cross_val_score, X, y, scoring='roc_auc', cv=cv)
print(s)
# [ 0.42678571  0.46804511  0.36090226]

Answer 1

The mistake you are making is calling the RandomForestClassifier whose default arg, random_state is None. So, it picks up the seed generated by np.random to produce the random output.

The random_state in both StratifiedKFold and RandomForestClassifier need to be the same inorder to produce equal arrays of scores of cross validation.

Illustration:

X=np.random.random((100,5))
y=np.random.randint(0,2,(100,))

clf = RandomForestClassifier(random_state=1)
cv = StratifiedKFold(y, random_state=1)        # Setting random_state is not necessary here
s = cross_val_score(clf, X,y,scoring='roc_auc', cv=cv)
print(s)
##[ 0.57612457  0.29044118  0.30514706]
print(s)
##[ 0.57612457  0.29044118  0.30514706]

Another way of countering it would be to not provide random_state args for both RFC and SKF. But, simply providing the np.random.seed(value) to create the random integers at the beginning. These would also create equal arrays at the output.

How to use cross_val_score with random_state

Question

1 answers

solution1
10 ACCPTED 2016-09-30 09:00:31

How to use cross_val_score with random_state

Question

1 answers

solution1 10 ACCPTED 2016-09-30 09:00:31

solution1
10 ACCPTED 2016-09-30 09:00:31