I am using the following code to get the probability of class 1.
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
clf=RandomForestClassifier(n_estimators=10, random_state = 42, class_weight="balanced")
k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
proba = cross_val_predict(clf, X, y, cv=k_fold, method='predict_proba')
#print probability of class 1
print(proba[:,1])
My result looks as follows.
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.1 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0.2 0. 0. 0. 0. 0.1 0. 0. 0. 0. 0. 0. 0. 0. 0.9 1. 0.7 1.
1. 1. 1. 0.7 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0.9 0.9 0.1 1.
0.6 1. 1. 1. 0.9 0. 1. 1. 1. 1. 1. 0.4 0.9 0.9 1. 1. 1. 0.9
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0.9 0.
0.1 0. 0. 0. 0. 0. 0. 0. 0.1 0. 0. 0.8 0. 0.1 0. 0.1 0. 0.1
0.3 0.2 0. 0.6 0. 0. 0. 0.6 0.4 0. 0. 0. 0.8 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. ]
However, this is only a list of probabilites and hard to interpret the results.
Suppose, I also have a list of names for each data point in the
iris dataset` as follows (Iris dataset has 150 datapoints).
iris_names = ['iris_0', 'iris_1', 'iris_2', 'iris_3', 'iris_4', 'iris_5', 'iris_6', 'iris_7', 'iris_8', 'iris_9', 'iris_10', 'iris_11', 'iris_12', 'iris_13', 'iris_14', 'iris_15', 'iris_16', 'iris_17', 'iris_18', 'iris_19', 'iris_20', 'iris_21', 'iris_22', 'iris_23', 'iris_24', 'iris_25', 'iris_26', 'iris_27', 'iris_28', 'iris_29', 'iris_30', 'iris_31', 'iris_32', 'iris_33', 'iris_34', 'iris_35', 'iris_36', 'iris_37', 'iris_38', 'iris_39', 'iris_40', 'iris_41', 'iris_42', 'iris_43', 'iris_44', 'iris_45', 'iris_46', 'iris_47', 'iris_48', 'iris_49', 'iris_50', 'iris_51', 'iris_52', 'iris_53', 'iris_54', 'iris_55', 'iris_56', 'iris_57', 'iris_58', 'iris_59', 'iris_60', 'iris_61', 'iris_62', 'iris_63', 'iris_64', 'iris_65', 'iris_66', 'iris_67', 'iris_68', 'iris_69', 'iris_70', 'iris_71', 'iris_72', 'iris_73', 'iris_74', 'iris_75', 'iris_76', 'iris_77', 'iris_78', 'iris_79', 'iris_80', 'iris_81', 'iris_82', 'iris_83', 'iris_84', 'iris_85', 'iris_86', 'iris_87', 'iris_88', 'iris_89', 'iris_90', 'iris_91', 'iris_92', 'iris_93', 'iris_94', 'iris_95', 'iris_96', 'iris_97', 'iris_98', 'iris_99', 'iris_100', 'iris_101', 'iris_102', 'iris_103', 'iris_104', 'iris_105', 'iris_106', 'iris_107', 'iris_108', 'iris_109', 'iris_110', 'iris_111', 'iris_112', 'iris_113', 'iris_114', 'iris_115', 'iris_116', 'iris_117', 'iris_118', 'iris_119', 'iris_120', 'iris_121', 'iris_122', 'iris_123', 'iris_124', 'iris_125', 'iris_126', 'iris_127', 'iris_128', 'iris_129', 'iris_130', 'iris_131', 'iris_132', 'iris_133', 'iris_134', 'iris_135', 'iris_136', 'iris_137', 'iris_138', 'iris_139', 'iris_140', 'iris_141', 'iris_142', 'iris_143', 'iris_144', 'iris_145', 'iris_146', 'iris_147', 'iris_148', 'iris_149']
Now, I want to sort my cross_val_predict
results for the class 1 and add it with the iris names
.
So, my expected output is as follows.
sorted_probability_of_class_1 = [[iris_xxx, 1], [iris_xxx, 1], ........, [iris_xxx, 0.9], [iris_xxx, 0.8], ........, [iris_xxx, 0], [iris_xxx, 0]]
How can I do it? Does the probabilities in cross_val_predict
are in the order of our original datapoints?
I am happy to provide more details if needed.
Merging both lists into one using zip()
:
sorted_probability_of_class_1 = zip(proba[:, 1], iris_names)
You may need to convert proba
to a list first using list(proba)
. Here is a more readable example of the zip
method:
>>> probabilities = [1, 2, 3, 0]
>>> labels = ['a', 'b', 'c', 'd']
>>> list(zip(labels, probabilities))
[('a', 1), ('b', 2), ('c', 3), ('d', 0)]
The zipped list can be sorted using sorted(iterable, key)
and itemgetter
:
>>> from operator import itemgetter
>>> merged_list = list(zip(labels, probabilities))
>>> merged_list
[('a', 1), ('b', 2), ('c', 3), ('d', 0)]
>>> sorted(merged_list, key=itemgetter(1))
[('d', 0), ('a', 1), ('b', 2), ('c', 3)]
itemgetter(1)
accesses the second element of the tuple in the list of tuples. This may need to be adjusted depending on your working code.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.