簡體 English 中英

帶有 python 的 TF-IDF 矢量化器

[英]TF-IDF vectorizer with python

原文 2020-05-10 09:47:46 9 2 python/ vectorization/ tf-idf/ tfidfvectorizer

我對 python 中的 TfidfVectorizer function 有問題。 例如，如果我有一個這樣的字符串：'xxx//xx. aaa.bb.ccc.d' 將提取這些單詞作為字典的鍵：'xxx', 'xx', 'aaa', 'bb', 'ccc', 'd' 相反，我想創建這些新功能：'xxx//xx.'、'aaa.bb.ccc.d'

我如何詢問 TfidfVectorizer function 到 select 單詞之間用空格（''）分隔？

2 個解決方案

看看： https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

有一個參數叫做token-pattern。

TfidfVectorizer 中token-pattern參數用於指定自定義拆分模式

from sklearn.feature_extraction.text import TfidfVectorizer
a = ['xxx//xx. aaa.bb.ccc.d']  
t = TfidfVectorizer(token_pattern=r"([a-z]*//[a-z]*)|([a-z.]*)")

輸出

[('', ''), ('', '.'), ('', 'aaa.bb.ccc.d'), ('xxx//xx', '')]

在這種情況下，需要進行一些后期清潔。

TF-IDF矢量化器搜索查詢Python

[英]TF-IDF Vectorizer Search Query Python

用於提取 ngram 的 TF-IDF 矢量化器

[英]TF-IDF vectorizer to extract ngrams

NotFittedError：未安裝 TF-IDF 矢量化器

[英]NotFittedError: The TF-IDF vectorizer is not fitted

將計數矢量化器轉換為 tf-idf

[英]Converting count vectorizer to tf-idf

從頭開始構建 TF-IDF 矢量化器

[英]Building a TF-IDF Vectorizer from Scratch

從頭開始實現 TF-IDF 向量化器

[英]Implementing a TF-IDF Vectorizer from Scratch

更快的 sklearn tf-idf 矢量化器

[英]faster sklearn tf-idf vectorizer

TF-IDF矢量化器的use_idf參數說明

[英]tf-idf vectorizer's use_idf parameter explanation

基於現有標點符號化句子（TF-IDF 向量化器）

[英]Tokenize sentence based on existing punctuation (TF-IDF vectorizer)

Python中的TF-IDF矩陣

[英]TF-IDF Matrix In Python

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 TF-IDF矢量化器搜索查詢Python 用於提取 ngram 的 TF-IDF 矢量化器 NotFittedError：未安裝 TF-IDF 矢量化器將計數矢量化器轉換為 tf-idf 從頭開始構建 TF-IDF 矢量化器從頭開始實現 TF-IDF 向量化器更快的 sklearn tf-idf 矢量化器 TF-IDF矢量化器的use_idf參數說明基於現有標點符號化句子（TF-IDF 向量化器） Python中的TF-IDF矩陣

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM