協作主題建模的簡單Python實現？

Question

我發現這兩篇文章結合了協同過濾（矩陣分解）和主題建模（LDA），根據用戶感興趣的帖子/文章的主題條款向用戶推薦類似的文章/帖子。

論文（PDF格式）為：“ 推薦科學論文的 協作主題建模 ”和“ 推薦GitHub存儲庫的協作主題建模 ”

新算法稱為協作主題回歸 。 我希望找到一些實現此功能的python代碼，但無濟於事。 這可能是一個很長的鏡頭，但有人可以顯示一個簡單的python示例嗎？

Answer 1

這應該讓你開始（雖然不知道為什么還沒有發布）： https ： //github.com/arongdari/python-topic-model

更具體地說： https ： //github.com/arongdari/python-topic-model/blob/master/ptm/collabotm.py

class CollaborativeTopicModel:
    """
    Wang, Chong, and David M. Blei. "Collaborative topic 
                                modeling for recommending scientific articles."
    Proceedings of the 17th ACM SIGKDD international conference on Knowledge
                                discovery and data mining. ACM, 2011.
    Attributes
    ----------
    n_item: int
        number of items
    n_user: int
        number of users
    R: ndarray, shape (n_user, n_item)
        user x item rating matrix
    """

看起來很好，很直接。 我仍然建議至少看看gensim 。 Radim在優化該軟件方面做得非常出色。

Answer 2

使用gensin的一個非常簡單的LDA實現。 您可以在此處找到更多信息： https ： //radimrehurek.com/gensim/tutorial.html

我希望它可以幫到你

from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
from nltk.stem import RSLPStemmer
from gensim import corpora, models
import gensim

st = RSLPStemmer()
texts = []

doc1 = "Veganism is both the practice of abstaining from the use of animal products, particularly in diet, and an associated philosophy that rejects the commodity status of animals"
doc2 = "A follower of either the diet or the philosophy is known as a vegan."
doc3 = "Distinctions are sometimes made between several categories of veganism."
doc4 = "Dietary vegans refrain from ingesting animal products. This means avoiding not only meat but also egg and dairy products and other animal-derived foodstuffs."
doc5 = "Some dietary vegans choose to wear clothing that includes animal products (for example, leather or wool)." 

docs = [doc1, doc2, doc3, doc4, doc5]

for i in docs:

    tokens = word_tokenize(i.lower())
    stopped_tokens = [w for w in tokens if not w in stopwords.words('english')]
    stemmed_tokens = [st.stem(i) for i in stopped_tokens]
    texts.append(stemmed_tokens)

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# generate LDA model using gensim  
ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word = dictionary, passes=20)
print(ldamodel.print_topics(num_topics=2, num_words=4))

[（0，u'0.066 *動物+ 0.065 *，+ 0.047 *產品+ 0.028 *哲學'），（1，u'0.085 *。+ 0.047 *產品+ 0.028 *膳食+ 0.028 * veg'）]

Answer 3

你已經標記了機器學習和python ，你是否看過python pandas和sklearn模塊，因為有了這兩個模塊，你可以快速創建大量的線性回歸對象。

還有一個相對於主題提取（具有非負矩陣分解和潛在Dirichlet分配）的代碼示例，它可以滿足您的確切需求，還可以幫助您發現sklearn模塊

問候

協作主題建模的簡單Python實現？

問題描述

3 個解決方案

解決方案1
5 2016-10-12 20:42:11

解決方案2
0 2016-12-04 02:21:05

解決方案3
-2 2016-10-03 16:55:19

協作主題建模的簡單Python實現？

問題描述

3 個解決方案

解決方案1 5 2016-10-12 20:42:11

解決方案2 0 2016-12-04 02:21:05

解決方案3 -2 2016-10-03 16:55:19

解決方案1
5 2016-10-12 20:42:11

解決方案2
0 2016-12-04 02:21:05

解決方案3
-2 2016-10-03 16:55:19