在Twitter數據上使用Python進行主題建模

Question

我想對csv格式的twitter數據執行主題建模。 我將數據加載到jupyter中。

    # Import pandas as pd
    import pandas as pd
    # Load the dataset
    tweet_data = pd.read_csv("C://Users/shivam/Desktop/USA_TWEETS .csv", sep='\t', names = ["Date", "ID", "Place", "Text", "Username"])
    tweet_data_df = pd.DataFrame(tweet_data)

現在，我要應用主題建模。 在文本變量上。 我應該如何前進？

   # let us now, store the text variable of the data-frame in another object
   tweets = tweet_data.Text

請提供您的代碼建議，這是主題建模概念的新手

我正在嘗試做這樣的事情，但得到了錯誤-TypeError：預期的字符串或類似字節的對象

    from sklearn.feature_extraction.text import CountVectorizer, 
    TfidfVectorizer
    from sklearn.decomposition import LatentDirichletAllocation

    import pandas as pd
    import nltk
    from nltk.corpus import stopwords
    from nltk.tokenize import word_tokenize
    from nltk.stem import PorterStemmer



    stemmer = PorterStemmer()

    extracted_data = []


    for x in range(0, len(data)-1):
        for word in word_tokenize(text_data.tolist()[x]):
        extracted_data.append(word)

    print(extracted_data)

同樣，也請幫助我，或幫助其他代碼從頭開始應用主題建模。 提前致謝。

數據集： https ：//drive.google.com/open ？ id = 0B5i9wCO1uYC9aV9fVHg4dHVidjQ

Answer 1

我會說使用lda軟件包。 scikit-learn程序包對於降低維度很方便，但是對於獲取主題詞和文檔主題分布卻不是很方便。

該代碼類似於從此處復制的以下代碼。

>>> import numpy as np
>>> import lda
>>> import lda.datasets
>>> X = lda.datasets.load_reuters()
>>> vocab = lda.datasets.load_reuters_vocab()
>>> titles = lda.datasets.load_reuters_titles()
>>> X.shape
(395, 4258)
>>> X.sum()
84010
>>> model = lda.LDA(n_topics=20, n_iter=1500, random_state=1)
>>> model.fit(X)  # model.fit_transform(X) is also available
>>> topic_word = model.topic_word_  # model.components_ also works
>>> n_top_words = 8
>>> for i, topic_dist in enumerate(topic_word):
...     topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n_top_words+1):-1]
...     print('Topic {}: {}'.format(i, ' '.join(topic_words)))

探索變量X，vocab和標題，以了解它們是什么以及如何在自己的數據集上構建它們。 要構建n_doc X n_vocab矩陣，可以使用scikit-learn的矢量化器。

Answer 2

出現該錯誤的原因是您需要一個字符串來傳遞給分類器。

用這個：

tweet = " ".join(tweet)

在Twitter數據上使用Python進行主題建模

問題描述

2 個解決方案

解決方案1
0 2017-04-27 23:52:19

解決方案2
0 2017-09-25 11:41:50

在Twitter數據上使用Python進行主題建模

問題描述

2 個解決方案

解決方案1 0 2017-04-27 23:52:19

解決方案2 0 2017-09-25 11:41:50

解決方案1
0 2017-04-27 23:52:19

解決方案2
0 2017-09-25 11:41:50