簡體   English   中英

'KeyError:' 迭代 Pandas 數據框時

[英]'KeyError:' when iterating over pandas data frame

我有一個 Dataframe df 有兩列:'label' 和 'review'。 作為數據清理過程,我刪除了所有空值。 現在我想從評論欄中刪除所有停用詞和標點符號。

數據框

當我嘗試這段代碼時,我收到了 keyerror。

    stemmer = PorterStemmer()
    for i in range(len(df)):
        review = re.sub('[^a-zA-Z]', ' ',df['review'][i] )
        review = review.lower()
        review = review.split()
        review = [ stemmer.stem(word) for word in review if word not in stopwords.words('english')]
        df['review'][i] = " ".join(review)
    

代碼

     KeyError                                  Traceback (most recent call last)
    <ipython-input-44-91ef309cd900> in <module>
          2 
          3 for i in range(len(df)):
     ----> 4     review = re.sub('[^a-zA-Z]', ' ',df['review'][i] )
          5     review = review.lower()
          6     review = review.split()

    ~\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
        866         key = com.apply_if_callable(key, self)
        867         try:
    --> 868             result = self.index.get_value(self, key)
        869 
        870             if not is_scalar(result):

    ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
       4373         try:
       4374             return self._engine.get_value(s, k,
     -> 4375                                           tz=getattr(series.dtype, 'tz', None))
       4376         except KeyError as e1:
       4377             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

    pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

    pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

    pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

    pandas/_libs/hashtable_class_helper.pxi in 
    pandas._libs.hashtable.Int64HashTable.get_item()

    pandas/_libs/hashtable_class_helper.pxi in 
    pandas._libs.hashtable.Int64HashTable.get_item()

    KeyError: 140

請幫幫我。

下面是一個沒有循環的解決方案。 在 Pandas 中使用循環作為最后的資源:

df['review'] = df['review'].replace('[^a-zA-Z]',' ',regex=True)
df['review'] = df['review'].str.lower()
df['review'] = df['review'].str.split()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM