簡體 English 中英

Python NLTK-防止移除停用詞來刪除每個詞

[英]Python NLTK - Preventing stop word removal from removing EVERY word

原文 2016-11-18 18:21:03 7 2 python/ nltk

我正在使用很短的單詞串，其中有些很愚蠢。 假設地，我可以有一個字符串“ you a a”，如果我刪除停用詞，那么該字符串將為空白。 由於我是在循環中進行分類，因此如果涉及到空白字符串，它只會因錯誤而停止。 我創建了以下代碼來解決此問題：

def title_features(words):
filter_words = [word for word in words.split() if word not in stopwords.words('english')]
features={}
if len(filter_words) >= 1:
    features['First word'] = ''.join(filter_words[0])
else:
    features['First word'] = ''.join(words.split()[0])
return features

這樣可以確保沒有錯誤，但是我想知道是否有更有效的方法來解決。 或者采取一種方法來解決所有單詞（如果它們都是停用詞）不會消失的情況。

2 個解決方案

最簡單的解決方案是檢查過濾結果，並在必要時還原完整的單詞列表。 然后，其余代碼可以使用單個變量而不進行檢查。

def title_features(words):
    filter_words = [word for word in words.split() if word not in stopwords.words('english')]
    if not filter_words:       # Use full list if necessary
        filter_words = words

    features={}
    features['First word'] = filter_words[0]
    features[...] = ...

    return features

您可以將其重寫為：

def title_features(words):
    filtered = [word for word in words.split() if word not in stopwords.words('english')]
    return {'First word': (filtered or words.split(None, 1) or [''])[0]}

如果不為空（例如，具有一個或多個長度或一個或多個），或者為空，則將對其進行filtered ，然后繼續拆分原始文件，如果為空，則默認為一個帶空的元素列表串。 然后，您將使用選擇的任何一個中的[0]作為第一個元素（第一個不間斷字，字符串的第一個字或空字符串）。

使用NLTK停止單詞刪除

[英]Stop Word Removal with NLTK

在不使用 NLTK 的情況下從 Python 中的文本中刪除停止詞

[英]Removing Stop Word From a Text in Python Without Using NLTK

從 python 的列表中刪除自定義單詞

[英]custom word removal from a list in python

從NLTK for Python中的同義詞列表中提取單詞

[英]Extract word from a list of synsets in NLTK for Python

如何在python中修改NLTK的停用詞列表？

[英]How can I modify the NLTK the stop word list in python?

Python NLTK：搜索單詞的出現

[英]Python NLTK: search for occurrence of a word

單詞和名詞相似性Python NLTK

[英]Word and noun similarity Python NLTK

NLTK Python中的詞義歧義消歧

[英]Word sense disambiguation in NLTK Python

Python筆記本中的Nltk Word令牌生成器

[英]Nltk Word Tokenizer in python notebook

如何使用NLTK和Python從文本中刪除自定義單詞模式

[英]How to remove a custom word pattern from a text using NLTK with Python

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 使用NLTK停止單詞刪除在不使用 NLTK 的情況下從 Python 中的文本中刪除停止詞從 python 的列表中刪除自定義單詞從NLTK for Python中的同義詞列表中提取單詞如何在python中修改NLTK的停用詞列表？ Python NLTK：搜索單詞的出現單詞和名詞相似性Python NLTK NLTK Python中的詞義歧義消歧 Python筆記本中的Nltk Word令牌生成器如何使用NLTK和Python從文本中刪除自定義單詞模式

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM