標簽[stop-words] - 堆棧內存溢出

[英]R stopwords: getting rid of ALL the words starting with 'https'

我正在做一個包含 Twitter 抓取的項目。問題：我似乎無法刪除所有以“https”開頭的單詞。我的代碼：我添加了“https”和“http”標簽，但沒有用。我當然可以用 gsub 清理 output，但這與我仍然將鏈接名稱的 rest 作為 output 不同。我有什么想法可以做到這一 ...

如何在不使用 nltk 的情況下將 append 個停用詞從文本文件中刪除？

[英]How to append stopwords from being in a text file without using nltk?

目前，這段代碼輸出一個詞在 input_files 文本文檔中出現的頻率。但是，我需要省略在 stopwords.txt 文檔中找到的停用詞 - 我無法為此使用 nltk。從本質上說最有效的方法是什么 ...

修改 Stopword-Removal-Code 以刪除數字

[英]Modify Stopword-Removal-Code to remove numbers as well

我在 df 列中有一個標記化文本。從中刪除停用詞的代碼有效，但我也喜歡刪除標點符號、數字和特殊字符，而不是將它們拼寫出來。就像我想確保它也會刪除更大/標記為一個標記的數字。我當前的代碼是： ...

無法刪除停用詞

[英]not be able to remove stopword

我有停用詞列表，但程序無法刪除語料庫中的停用詞我使用的代碼所以我用這段代碼xtrain['question'].apply(lambda x: clean_text(x))到我的語料庫，行是這樣的，並以第一個索引為例話： 'Dok,anak saya sudah imunisasi DPT' o ...

無法更新停用詞列表中的 ' 和 ""

[英]Unable to update ' and "" in the stop_word list

我試圖更新停用詞列表中的 ' 和 "。我收到以下錯誤。如何更新停用詞中的那些字符？ ...

標記句子以刪除停用詞：停用詞不會被刪除

[英]tokenize sentence to remove stop words: stop words are not being removed

我下面的代碼應該從數據庫中獲取一個句子，按單詞對其進行標記，然后相應地刪除停用詞。出於某種原因，當我在 for 循環中調用 removestopwords function 時，它不起作用。有什么建議么？當我用任何插入的句子調用 removestopwords function 時，它工作得很 ...

如果所有單詞都在停用詞列表中，則將其刪除

[英]Remove words if all of them are in a stop words list

我有一組單詞，它可以包含一個或多個單詞。 In case of one word, it's easy to remove it, but when choose to remove multiple words if they are all in the stop words list is ...

在 python 中使用 for 循環刪除停用詞

[英]remove stopwords using for loop in python

最近在研究python循環，想試試能不能用for循環去掉停用詞和標點符號。但是，我的代碼不起作用，它顯示標點符號未定義。（由於評論可以刪除停用詞）我知道可以使用列表推導來實現，StackOverflow 中有很多答案，但我想知道如何使用 for 循環來實現。我用來練習的代碼如下：預期的 ...

從文本中刪除停用詞時出錯

[英]error while removing the stop-words from the text

我正在嘗試從我的數據中刪除停用詞，並且我已使用此語句下載停用詞。 stop = set(stopwords.words('english')) 這將字符“d”作為停用詞之一。因此，當我將其應用於我的 function 時，它會從單詞中刪除“d”。請參閱所附圖片以供參考，並指導我如何解決此問題。 ...

python 正則表達式使用 re 模塊（不是正則表達式模塊）獲取文本的最后一個單詞，直到停止詞

[英]python regex get last words of a text up to a stop word with re module (not regex module)

我在將文本的最后一句話變成停用詞之后。想象一下我有文字：從頭回來我想得到“藍皮書”。為了做到這一點，我使用了正則表達式模塊正則表達式解釋： (?r) = 反向 (?<=(\s*\b(an|a|the|for)\b\s*)) =查看任何帶有單詞邊界的停用詞 \b (?P 特征。 ?) ...

使用 Gensim 刪除停用詞

[英]deleting stopwords with Gensim

我正在嘗試使用它的網站來學習 Gensim。有一個名為“remove_stopword_tokens”的 function 對我的研究很有用。現在，雖然該模塊已定義並出現在他們的網站上（確切鏈接：鏈接），但我無法在我的 colab 上導入它注意：這是我的代碼：import gensim fro ...

如何從一列的一行中刪除特定單詞並使用 python 將刪除的 substring 粘貼到另一列

[英]how to remove a specific word from a row of one column and paste the removed substring to another column using python

刪除單詞 = ['apple','banana'] 可樂蘋果很好拆除后可樂很好 col刪除了單詞蘋果 ...

Spacy，用空格清理文本時如何不刪除“不”

[英]Spacy, how not to remove "not" when cleaning the text with space

我使用這個spacy代碼稍后將其應用於我的文本，但我需要否定詞留在文本中，如“not”。當我申請時，我得到了這個結果：然而原句是所以，我想看看下面這句話： [earphone, still, not, work]謝謝 ...

`Defaults.stop_words` 中的停用詞計數與源自 `nlp.vocab` 的停用詞計數不匹配？

[英]Mismatch in the count of stop-words in `Defaults.stop_words` and the ones derived from `nlp.vocab`?

假設我們有nlp = spacy.load('en_core_web_sm') 。輸入len(nlp.Defaults.stop_words)時，它返回326 ，但是當我運行以下代碼（基本上計算詞匯表的停用詞）時，我得到111 ：鑒於（可能） Defaults.stop_words和nlp.v ...

印度尼西亞語中的停用詞

[英]Stopword in Bahasa Indonesia

如何刪除印尼語的停用詞？如果用英文 R 編碼是...... 謝謝您的支持。 ...

如何執行詞干提取並以原始評論格式放回單詞？

[英]How to perform stemming and put back the words in the orginal review format?

我有一個數據集，其中一列是full_text ，其中包含來自在線網站的評論文本。我想通過刪除停用詞和詞干並將它們恢復到原始格式來清理這些評論（讓所有詞干詞形成一個句子，即：每條評論一行而不是每行 1 個詞干詞。）我正在嘗試以下操作：但是，這個新列stemmed_Description看起 ...

nltk stopwords - AttributeError：'function'對象沒有屬性'words'

[英]nltk stopwords - AttributeError: 'function' object has no attribute 'words'

這是我的進口：這是我的代碼：申請：我收到了這個錯誤：AttributeError: 'function' object has no attribute 'words' 有人可以幫我解決這個問題嗎？ ...

如何根據python中允許的單詞列表過濾句子？

[英]How to filter a sentence based on list of the allowed words in python?

我將 allow_wd 作為要搜索的單詞。 sentench 是主數據庫的數組。輸出需要：請幫忙 ...

從 R 中的文本中刪除停用詞

[英]Removing Stop Words From Text in R

我在從文本數據中刪除 stop_words 時遇到問題。該數據集是網絡抓取的，包含客戶評論，如下所示：我進行了以下數據操作，並在數據框中創建了一個新變量，現在評論看起來像這樣：下一步是刪除停用詞，為此我使用以下代碼：之后的輸出是我嘗試了其他一些方法，但是，結果不是想要的，因 ...

TypeError：無法讀取未定義的屬性“removeStopwords”

[英]TypeError: Cannot read property 'removeStopwords' of undefined

我正在使用停用詞和打字稿進行項目，我收到以下錯誤，我試圖通過刪除 !string.trim() 並替換為 string.trim() 來調試，我收到輸出 0 0 還試圖給 fixedSpelling 一個類型 any 以及 getSentiment(str:string | undefined) ...