简体   繁体   中英

Content based recommender system with sklearn or numpy

I am trying to build a content-based recommender system in python/pandas/numpy/sklearn.

Here are the matrix involved and their size:

X: n_customers * n_features (contains the features of each customer)

Y: n_customers *n_products (contains the scores given by each customer to each product)

Theta: n_features * n_products

The aim is to learn Theta in order to be able to predict the score given by a customer to all products (X*Theta). Indeed, Y is a sparse matrix, a customer score only a very small % of the whole quantity of products. This is why Y contains a lot of NaN values.

Here is my problem:

This is a regression problem with many targets (here target=product). But I want to do the regression only on not null values. because the number of NaN differ from one product to another, how can I vectorize that ?

Assume there are 1000 products and 100 000 customers, each one having 20 features.

For each product I need to the regression on the not null values. So without vectorization, I would need 1000 different regressor learning each one a Theta vector of length 20.

If possible I would like to solve this problem with sklearn. The ridge regression for example takes into account multiple targets (Y as a matrix)

I hope it's clear enough.

Thank you for your help.

I believe You can use centered cosine similarity /pearson corelation to make this work and make use of collaborative filtering technique to achieve this

Before you use pearson co -relation you need to fill the Null ( the fields which dont have any entries) with zero ,now pearson co relation centers the similarity matrix around zero ,which gives optimum recommendation .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM