嘗試訪問行索引時出現 Pandas KeyError

Question

我有一個名為 clean 的 dataframe，然后將其分成兩個樣本：train_data 和 test_data，代碼如下：

train_data = clean.sample(frac=0.75)
test_data = clean.drop(train_data.index)

我正在嘗試從 train_data dataframe 中制作詞頻 dataframe。 我從代碼開始

from collections import defaultdict as dct

phrases = []
for word in train_data['Message']:
    phrases.append(word.split())
    
ham = dct(int)
spam = dct(int)
    
for i in range(len(phrases)):
    if train_data['Category'][i] == 'ham':
        print(train_data['Category'][i])
    elif train_data['Category'][i] == 'spam':
        print(train_data['Category'][i])

但它給了我一個錯誤if train_data['Category'][i] == 'ham':當索引 i 不在 train_data 中時：

KeyError                                  Traceback (most recent call last)
~/Library/Python/3.8/lib/python/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 5

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-97-17de52f682b3> in <module>
      9 
     10 for i in range(len(phrases)):
---> 11     if train_data['Category'][i] == 'ham':
     12         print(train_data['Category'][i])
     13     elif train_data['Category'][i] == 'spam':

~/Library/Python/3.8/lib/python/site-packages/pandas/core/series.py in __getitem__(self, key)
    851 
    852         elif key_is_scalar:
--> 853             return self._get_value(key)
    854 
    855         if is_hashable(key):

~/Library/Python/3.8/lib/python/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
    959 
    960         # Similar to Index.get_value, but we do not fall back to positional
--> 961         loc = self.index.get_loc(label)
    962         return self.index._get_values_for_loc(self, loc, label)
    963 

~/Library/Python/3.8/lib/python/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 5

train_data 看起來像這樣（前 20 行）：

  Category                                            Message
1635      ham  you have come into my life and brought the sun...
3724      ham                    nothing splwat abt u and whr ru
1531      ham        oh dang i didnt mean o send that to you lol
1672     spam  urgent we are trying to contact u todays draw ...
2022     spam  u can win å100 of music gift vouchers every we...
4889      ham  sounds like there could be a lot of time spent...
4526      ham  understand his loss is my gain  so do you work...
1827      ham  hey gorgeous man my work mobile number is have...
3835      ham                   then ì_ come n pick me at 530 ar
342       ham                       where u been hiding stranger
2040      ham        you always make things bigger than they are
1788      ham                        arun can u transfr me d amt
860       ham                  in work now going have in few min
2298      ham  dont pick up d call when something important i...
763       ham  nothing but we jus tot u would ask cos u ba gu...
2475      ham                      mm i am on the way to railway
5156      ham  sir i need velusamy sirs date of birth and com...
164      spam  bangbabes ur order is on the way u should rece...
3671      ham   came to look at the flat seems ok in his 50s ...
4302      ham                                        yup im free

問題是什么？

Answer 1

查看.loc和.iloc的文檔

您可以嘗試使用if train_data['Category'].iloc[i] == 'ham'

修改后的代碼將是：

for i in range(len(phrases)):
    if train_data['Category'].iloc[i] == 'ham':
        print(train_data['Category'].iloc[i])
    elif train_data['Category'].iloc[i] == 'spam':
        print(train_data['Category'].iloc[i])

Answer 2

KeyError: 5表示索引為5的行不存在。 發生這種情況是因為在使用.sample()時，使用了原始 DF 的索引並且可能沒有選擇第5行。

示例 DF：

   letter
0     A
1     B
2     C
3     D
4     E
5     F

sampled = df.sample(frac=0.5)

   letter
3    D
1    B
4    E

如果您嘗試使用for x in range(...)對樣本進行迭代，則0不存在並且會出錯。

您可以在 .sample .reset_index()之后使用 .reset_index .sample()

sampled = df.sample(frac=0.5).reset_index()

   letter
0    D
1    B
2    E

無論如何，一些提示：

不要迭代 DF 的行。 嘗試使用矢量化操作： How to iterate over rows in a DataFrame in Pandas
要制作詞頻字典，您可以使用 collections 中的Counter ： collections ://docs.python.org/3/library/collections.html#collections.Counter

嘗試訪問行索引時出現 Pandas KeyError

問題描述

2 個解決方案

解決方案1
0 2021-05-09 16:53:55

解決方案2
0 2021-05-09 16:55:49

嘗試訪問行索引時出現 Pandas KeyError

問題描述

2 個解決方案

解決方案1 0 2021-05-09 16:53:55

解決方案2 0 2021-05-09 16:55:49

解決方案1
0 2021-05-09 16:53:55

解決方案2
0 2021-05-09 16:55:49