[英]Pandas KeyError when trying to access row index
我有一個名為 clean 的 dataframe,然后將其分成兩個樣本:train_data 和 test_data,代碼如下:
train_data = clean.sample(frac=0.75)
test_data = clean.drop(train_data.index)
我正在嘗試從 train_data dataframe 中制作詞頻 dataframe。 我從代碼開始
from collections import defaultdict as dct
phrases = []
for word in train_data['Message']:
phrases.append(word.split())
ham = dct(int)
spam = dct(int)
for i in range(len(phrases)):
if train_data['Category'][i] == 'ham':
print(train_data['Category'][i])
elif train_data['Category'][i] == 'spam':
print(train_data['Category'][i])
但它給了我一個錯誤if train_data['Category'][i] == 'ham':
當索引 i 不在 train_data 中時:
KeyError Traceback (most recent call last)
~/Library/Python/3.8/lib/python/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 5
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-97-17de52f682b3> in <module>
9
10 for i in range(len(phrases)):
---> 11 if train_data['Category'][i] == 'ham':
12 print(train_data['Category'][i])
13 elif train_data['Category'][i] == 'spam':
~/Library/Python/3.8/lib/python/site-packages/pandas/core/series.py in __getitem__(self, key)
851
852 elif key_is_scalar:
--> 853 return self._get_value(key)
854
855 if is_hashable(key):
~/Library/Python/3.8/lib/python/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
959
960 # Similar to Index.get_value, but we do not fall back to positional
--> 961 loc = self.index.get_loc(label)
962 return self.index._get_values_for_loc(self, loc, label)
963
~/Library/Python/3.8/lib/python/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 5
train_data 看起來像這樣(前 20 行):
Category Message
1635 ham you have come into my life and brought the sun...
3724 ham nothing splwat abt u and whr ru
1531 ham oh dang i didnt mean o send that to you lol
1672 spam urgent we are trying to contact u todays draw ...
2022 spam u can win å100 of music gift vouchers every we...
4889 ham sounds like there could be a lot of time spent...
4526 ham understand his loss is my gain so do you work...
1827 ham hey gorgeous man my work mobile number is have...
3835 ham then ì_ come n pick me at 530 ar
342 ham where u been hiding stranger
2040 ham you always make things bigger than they are
1788 ham arun can u transfr me d amt
860 ham in work now going have in few min
2298 ham dont pick up d call when something important i...
763 ham nothing but we jus tot u would ask cos u ba gu...
2475 ham mm i am on the way to railway
5156 ham sir i need velusamy sirs date of birth and com...
164 spam bangbabes ur order is on the way u should rece...
3671 ham came to look at the flat seems ok in his 50s ...
4302 ham yup im free
問題是什么?
KeyError: 5
表示索引為5
的行不存在。 發生這種情況是因為在使用.sample()
時,使用了原始 DF 的索引並且可能沒有選擇第5
行。
示例 DF:
letter
0 A
1 B
2 C
3 D
4 E
5 F
sampled = df.sample(frac=0.5)
letter
3 D
1 B
4 E
如果您嘗試使用for x in range(...)
對樣本進行迭代,則0
不存在並且會出錯。
您可以在 .sample .reset_index()
之后使用 .reset_index .sample()
sampled = df.sample(frac=0.5).reset_index()
letter
0 D
1 B
2 E
無論如何,一些提示:
不要迭代 DF 的行。 嘗試使用矢量化操作: How to iterate over rows in a DataFrame in Pandas
要制作詞頻字典,您可以使用 collections 中的Counter
: collections
://docs.python.org/3/library/collections.html#collections.Counter
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.