從熊貓列的列表中刪除停用詞時出現LookupError

Question

我有一個100萬條記錄的數據集，如下所示

樣本DF1：-

  articles_urlToImage   feed_status status    keyword
   hhtps://rqqkf.com    untagged     tag      the apple,a mobile phone
   hhtps://hqkf.com    tagged       ingore    blackberry, the a phone 
   hhtps://hqkf.com     untagged     tag      amazon, an shopping site

現在我想刪除停用詞和一些自定義停用詞，如下所示

自定義停用詞= ['phone'，'site']（我大約有35個自定義停用詞）

預期投入

 articles_urlToImage    feed_status status    keyword
   hhtps://rqqkf.com    untagged     tag     apple,mobile
   hhtps://hqkf.com     tagged       ingore    blackberry 
   hhtps://hqkf.com     untagged     tag      amazon,shopping

我試圖刪除停用詞，但出現以下錯誤

碼

import nltk
import string
from nltk.corpus import stopwords
stop = stopwords.words('english') 

df1['keyword'] = df1['keyword'].apply(lambda x: [item for item in x if item not in stop])

錯誤

  /usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   3612             if name in self._info_axis:
   3613                 return self[name]
-> 3614             return object.__getattribute__(self, name)
   3615 
   3616     def __setattr__(self, name, value):

AttributeError: 'Series' object has no attribute 'split'

Answer 1

您可以使用：

from nltk.corpus import stopwords
stop = stopwords.words('english') 
custom  = ['phone','site']
#join lists together
stop = custom + stop

#remove punctuation, split by whitespace and remove stop words
df1['keyword'] = (df1['keyword'].str.replace(r'[^\w\s]+', ' ')
                    .apply(lambda x: [item for item in x.split() if item not in stop]))
print (df1)
  articles_urlToImage feed_status  status             keyword
0   hhtps://rqqkf.com    untagged     tag     [apple, mobile]
1    hhtps://hqkf.com      tagged  ingore        [blackberry]
2    hhtps://hqkf.com    untagged     tag  [amazon, shopping]

從熊貓列的列表中刪除停用詞時出現LookupError

問題描述

1 個解決方案

解決方案1
0 2018-12-18 06:35:36

從熊貓列的列表中刪除停用詞時出現LookupError

問題描述

1 個解決方案

解決方案1 0 2018-12-18 06:35:36

解決方案1
0 2018-12-18 06:35:36