在 Python 中查找 tf-idf 時出現 fit_transform 錯誤

Question

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
mylist = [
    'a a b c',
    'a c c c d e f',
    'a c d d d',
    'a d f',
]
df = pd.DataFrame({"texts": mylist})
tfidf_vectorizer = TfidfVectorizer(ngram_range=[1, 1])
tfidf_separate = tfidf_vectorizer.fit_transform(df["texts"])

我試圖在第 3 行中找到“d”的 tf-idf 值。但是，它向我顯示空詞匯錯誤“ValueError：空詞匯；也許文檔只包含停用詞”。

任何有關如何解決錯誤的建議將不勝感激！

Answer 1

你可以這樣做：

定義analyzer='char'以便 TfidfVectorizer 使用字母；
在詞匯表中找到d的索引並使用它

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
mylist = [
    'a a b c',
    'a c c c d e f',
    'a c d d d',
    'a d f',
]
df = pd.DataFrame({"texts": mylist})
tfidf_vectorizer = TfidfVectorizer(ngram_range=[1, 1], analyzer='char')
tfidf_separate = tfidf_vectorizer.fit_transform(df["texts"])
ind = tfidf_vectorizer.vocabulary_['d']
tfidf_separate.todense()[2, ind]
>>> 0.6490674853546846

在 Python 中查找 tf-idf 時出現 fit_transform 錯誤

問題描述

1 個解決方案

解決方案1
0 已采納 2022-07-18 04:33:33

在 Python 中查找 tf-idf 時出現 fit_transform 錯誤

問題描述

1 個解決方案

解決方案1 0 已采納 2022-07-18 04:33:33

解決方案1
0 已采納 2022-07-18 04:33:33