[英]Create a confusion matrix
First note: I am creating a recommender system.第一个注意事项:我正在创建一个推荐系统。 This should later suggest articles to the user that they might also like.
这应该稍后向用户推荐他们可能也喜欢的文章。
I'm in the process of creating a confusion matrix.我正在创建一个混淆矩阵。 Unfortunately I am not able to make it.
不幸的是,我无法做到。 I'm getting an error.
我收到一个错误。 I have attached an example below, unfortunately I don't know how to rebuild my code.
我在下面附上了一个例子,不幸的是我不知道如何重建我的代码。
How do I have to rebuild it to get such a "nice" confusion matrix like in the example?我必须如何重建它才能像示例中那样获得如此“漂亮”的混淆矩阵?
Dataframe :数据框:
d = {'purchaseid': [0, 0, 0, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9, 9, 9, 9],
'itemid': [ 3, 8, 2, 10, 3, 10, 4, 12, 3, 12, 3, 4, 8, 6, 3, 0, 5, 12, 9, 9, 13, 1, 7, 11, 11]}
df = pd.DataFrame(data=d)
purchaseid itemid
0 0 3
1 0 8
2 0 2
3 1 10
4 2 3
... ... ...
Code :代码:
PERCENTAGE_SPLIT = 20
NUM_NEGATIVES = 4
def splitter(df):
df_ = pd.DataFrame()
sum_purchase = df['purchaseid'].nunique()
amount = round((sum_purchase / 100) * PERCENTAGE_SPLIT)
random_list = random.sample(df['purchaseid'].unique().tolist(), amount)
df_ = df.loc[df['purchaseid'].isin(random_list)]
df_reduced = df.loc[~df['purchaseid'].isin(random_list)]
return [df_reduced, df_]
def generate_matrix(df_main, dataframe, name):
mat = sp.dok_matrix((df_main.shape[0], len(df_main['itemid'].unique())), dtype=np.float32)
for purchaseid, itemid in zip(dataframe['purchaseid'], dataframe['itemid']):
mat[purchaseid, itemid] = 1.0
return mat
dfs = splitter(df)
df_tr = dfs[0].copy(deep=True)
df_val = dfs[1].copy(deep=True)
train_mat = generate_matrix(df, df_tr, 'train')
val_mat = generate_matrix(df, df_val, 'val')
num_users, num_items = train_mat.shape
def get_train_samples(train_mat, num_negatives):
user_input, item_input, labels = [], [], []
num_user, num_item = train_mat.shape
for (u, i) in train_mat.keys():
user_input.append(u)
item_input.append(i)
labels.append(1)
# negative instances
for t in range(num_negatives):
j = np.random.randint(num_item)
while (u, j) in train_mat.keys():
j = np.random.randint(num_item)
user_input.append(u)
item_input.append(j)
labels.append(0)
return user_input, item_input, labels
user_input, item_input, labels = get_train_samples(train_mat, NUM_NEGATIVES)
val_user_input, val_item_input, val_labels = get_train_samples(val_mat, NUM_NEGATIVES)
hist = model.fit([np.array(user_input), np.array(item_input)], np.array(labels),
validation_data=([np.array(val_user_input), np.array(val_item_input)], np.array(val_labels)))
from sklearn.metrics import classification_report
x_train = user_input, item_input
y_train = labels
x_test = val_user_input, val_item_input
y_test = val_labels
y_pred = model.predict(([np.array(val_user_input), np.array(val_item_input)], np.array(val_labels)), batch_size=64, verbose=1)
y_pred_bool = np.argmax(y_pred, axis=1)
print(classification_report(y_test, y_pred_bool))
Example :示例:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_confusion_matrix
# import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
class_names = iris.target_names
# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# Run classifier, using a model that is too regularized (C too low) to see
# the impact on the results
classifier = svm.SVC(kernel='linear', C=0.01).fit(X_train, y_train)
np.set_printoptions(precision=2)
# Plot non-normalized confusion matrix
title = ("Confusion matrix, without normalization")
disp = plot_confusion_matrix(classifier, X_test, y_test, display_labels=class_names, cmap=plt.cm.Blues,)
disp.ax_.set_title(title)
print(title)
print(disp.confusion_matrix)
plt.show()
Try :尝试:
import seaborn as sns
from sklearn.metrics import precision_recall_fscore_support, confusion_matrix, roc_curve, auc
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True)
[OUT] ValueError: Classification metrics can't handle a mix of binary and continuous targets
EDIT :编辑:
I'm using the NCF model我正在使用 NCF 模型
The architecture of a Neural Collaborative Filtering model.神经协同过滤模型的架构。 Taken from the Neural Collaborative Filtering paper .
摘自神经协同过滤论文。
# full NCF model
def get_model(num_users, num_items, latent_dim=8, dense_layers=[64, 32, 16, 8],
reg_layers=[0, 0, 0, 0], reg_mf=0):
# input layer
input_user = Input(shape=(1,), dtype='int32', name='user_input')
input_item = Input(shape=(1,), dtype='int32', name='item_input')
# embedding layer
mf_user_embedding = Embedding(input_dim=num_users, output_dim=latent_dim,
name='mf_user_embedding',
embeddings_initializer='RandomNormal',
embeddings_regularizer=l2(reg_mf), input_length=1)
mf_item_embedding = Embedding(input_dim=num_items, output_dim=latent_dim,
name='mf_item_embedding',
embeddings_initializer='RandomNormal',
embeddings_regularizer=l2(reg_mf), input_length=1)
mlp_user_embedding = Embedding(input_dim=num_users, output_dim=int(dense_layers[0]/2),
name='mlp_user_embedding',
embeddings_initializer='RandomNormal',
embeddings_regularizer=l2(reg_layers[0]),
input_length=1)
mlp_item_embedding = Embedding(input_dim=num_items, output_dim=int(dense_layers[0]/2),
name='mlp_item_embedding',
embeddings_initializer='RandomNormal',
embeddings_regularizer=l2(reg_layers[0]),
input_length=1)
# MF latent vector
mf_user_latent = Flatten()(mf_user_embedding(input_user))
mf_item_latent = Flatten()(mf_item_embedding(input_item))
mf_cat_latent = Multiply()([mf_user_latent, mf_item_latent])
# MLP latent vector
mlp_user_latent = Flatten()(mlp_user_embedding(input_user))
mlp_item_latent = Flatten()(mlp_item_embedding(input_item))
mlp_cat_latent = Concatenate()([mlp_user_latent, mlp_item_latent])
mlp_vector = mlp_cat_latent
# build dense layer for model
for i in range(1,len(dense_layers)):
layer = Dense(dense_layers[i],
activity_regularizer=l2(reg_layers[i]),
activation='relu',
name='layer%d' % i)
mlp_vector = layer(mlp_vector)
predict_layer = Concatenate()([mf_cat_latent, mlp_vector])
result = Dense(1, activation='sigmoid',
kernel_initializer='lecun_uniform',name='result')
model = Model(inputs=[input_user,input_item], outputs=result(predict_layer))
return model
The output of your model, y_pred
are not class values, but (as a softmax layer with one neuron is used), can be interpreted as class probabilities.模型的输出
y_pred
不是类值,但是(因为使用了一个带有一个神经元的 softmax 层),可以解释为类概率。 Particularly, the is not one confusion matrix to compute, you first have to select a discrimination threshold, that is, a value between 0
and 1
which defines which of the continuous values in your prediction should assigned to which class.特别是,这不是一个要计算的混淆矩阵,您首先必须选择一个区分阈值,即一个介于
0
和1
之间的值,用于定义预测中的哪个连续值应分配给哪个类。
Assume you found out that 0.8
is a good threshold, that is, every entry in y_pred
larger as 0.8
is assigned to class 1
, every entry smaller is assigned to class 0
, then the following should compute the confusion matrix for that particular threshold:假设您发现
0.8
是一个很好的阈值,即y_pred
大于0.8
每个条目都分配给第1
类,每个较小的条目分配给第0
类,那么以下应该计算该特定阈值的混淆矩阵:
import seaborn as sns
import numpy as np
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, (y_pred > 0.8).astype(np.int))
sns.heatmap(cm, annot=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.