简体   繁体   中英

User profiling for topic-based recommender system

I'm trying to come up with a topic-based recommender system to suggest relevant text documents to users.

I trained a latent semantic indexing model, using gensim, on the wikipedia corpus. This lets me easily transform documents into the LSI topic distributions. My idea now is to represent users the same way. However, of course, users have a history of viewed articles, as well as ratings of articles.

So my question is: how to represent the users?

An idea I had is the following: represent a user as the aggregation of all the documents viewed. But how to take into account the rating?

Any ideas?

Thanks

I don't think that's working with lsa.

But you maybe could do some sort of k-NN classification, where each user's coordinates are the documents viewed. Each object (=user) sends out radiation (intensity is inversely proportional to the square of the distance). The intensity is calculated from the ratings on the single documents.

Then you can place a object (user) in in this hyperdimensional space, and see what other users give the most 'light'.

But: Can't Apache Lucene do that whole stuff for you?

"represent a user as the aggregation of all the documents viewed" : that might work indeed, given that you are in linear spaces. You can easily add all the documents vectors in one big vector.

If you want to add the ratings, you could simply put a coefficient in the sum.

Say you group all documents rated 2 in a vector D2, rated 3 in D3 etc... you then simply define a user vector as U=c2*D2+c3*D3+... You can play with various forms for c2, c3, but the easiest approach would be to simply multiply by the rating, and divide by the max rating for normalisation reasons.

If your max rating is 5, you could define for instance c2=2/5, c3=3/5 ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM