簡體 English 中英

查找Tf-Idf使用scikit-learn從文檔集中僅選擇單詞的分數

[英]Finding Tf-Idf Scores of only selected words from set of documents using scikit-learn

原文 2016-03-16 16:38:51 1 1 python/ scipy/ nlp/ scikit-learn/ tf-idf

我有一組文件（存儲為.txt文件）。 我還有一些選定單詞的python字典。 我想只為這些單詞分配tf-idf分數，而不是從文檔集中分配所有單詞。 如何使用scikit-learn或任何其他庫來完成？

我已經提到了這篇博文，但它提供了大量的完整詞匯。

1 個解決方案

您可以使用CountVectorizer執行此操作， CountVectorizer將文檔掃描為文本並轉換為術語文檔矩陣，並在矩陣上使用TfidfTrasnformer 。

這兩個步驟也可以與TfidfVectorizer一起組合完成。

它們位於sklearn.feature_extraction.text模塊[ link ]中。

兩個進程都將返回相同的稀疏矩陣表示，我假設您可能會通過TruncatedSVD進行SVD變換以獲得更小的密集矩陣。

你當然也可以自己做，這需要保留兩張地圖，每張文件一張，一張整體，你可以計算條款。 這就是他們在引擎蓋下運作的方式。

這個頁面有一些很好的例子。

在scikit-learn tf-idf矩陣中獲取文檔名稱

[英]Get the document name in scikit-learn tf-idf matrix

Python Scikit學習：TF-IDF中的空詞匯表

[英]Python Scikit-learn: Empty Vocabulary in TF-IDF

scikit-learn中TF-IDF向量的組特征

[英]Group features of TF-IDF vector in scikit-learn

使用scikit-learn和hand計算的tf-idf矩陣值的差異

[英]Difference in values of tf-idf matrix using scikit-learn and hand calculation

解釋文檔中單詞的TF-IDF分數之和

[英]Interpreting the sum of TF-IDF scores of words across documents

Scikit Learn - 從特征數組的語料庫中計算TF-IDF，而不是從原始文檔的語料庫中計算TF-IDF

[英]Scikit Learn - Calculating TF-IDF from a corpus of arrays of features instead of from a corpus of raw documents

scikit-learn - 我應該使用TF或TF-IDF模型嗎？

[英]scikit-learn - Should I fit model with TF or TF-IDF?

使用Gensim獲取TF-IDF分數

[英]Getting TF-IDF Scores Of Words Using Gensim

查找具有指定 tf-idf 分數的單詞

[英]Find the words with specified tf-idf scores

如何獲得單詞的 TF-IDF 分數？

[英]How to get TF-IDF scores for the words?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 在scikit-learn tf-idf矩陣中獲取文檔名稱 Python Scikit學習：TF-IDF中的空詞匯表 scikit-learn中TF-IDF向量的組特征使用scikit-learn和hand計算的tf-idf矩陣值的差異解釋文檔中單詞的TF-IDF分數之和 Scikit Learn - 從特征數組的語料庫中計算TF-IDF，而不是從原始文檔的語料庫中計算TF-IDF scikit-learn - 我應該使用TF或TF-IDF模型嗎？使用Gensim獲取TF-IDF分數查找具有指定 tf-idf 分數的單詞如何獲得單詞的 TF-IDF 分數？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM