'KeyError:' 迭代 Pandas 数据框时？

Question

I have two lists Y_train and Y_test .我有两个列表Y_train和Y_test 。 At the moment they hold categorical data.目前他们持有分类数据。 Each element is either Blue or Green .每个元素要么是Blue要么是Green 。 They are going to be the targets for a Random Forest classifier.它们将成为随机森林分类器的目标。 I need them encoded as 1.0s and 0.0s.我需要将它们编码为 1.0s 和 0.0s。

Here is a print(Y_train) to show you what the data frame looks like.这是一个print(Y_train)向您展示数据框的样子。 The random numbers down the side are because the data has been shuffled.旁边的随机数是因为数据已被洗牌。 ( Y_test is the same, just smaller): （ Y_test是一样的，只是更小）：

183      Blue
126      Blue
1        Blue
409      Blue
575    Green
         ...   
396      Blue
192      Blue
578    Green
838    Green
222      Blue
Name: Colour, Length: 896, dtype: object

To encode this I was going to simply loop over them and change each element to their encoded values:为了对此进行编码，我将简单地遍历它们并将每个元素更改为它们的编码值：

for i in range(len(Y_train)):
        if Y_train[i] == 'Blue':
            Y_train[i] = 0.0
        else:
            Y_train[i] = 1.0

However, when I do this, I get the following:但是，当我这样做时，我得到以下信息：

Traceback (most recent call last):
  File "G:\Work\Colours.py", line 90, in <module>
    Main()
  File "G:\Work\Colours.py", line 34, in Main
    RandForest(X_train, Y_train, X_test, Y_test)
  File "G:\Work\Colours.py.py", line 77, in RandForest
    if Y_train[i] == 'Blue':
  File "C:\Users\Me\AppData\Roaming\Python\Python37\site-packages\pandas\core\series.py", line 1068, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Users\Me\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexes\base.py", line 4730, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 992, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 6

The weird thing is that it produces this error at different times.奇怪的是它在不同的时间产生这个错误。 I've used flags and prints to see how far it gets.我已经使用标志和印刷品来查看它的进展情况。 Sometimes it will get quite a few iterations into the loop, and then other times it will only do one or two iterations before breaking.有时它会在循环中进行多次迭代，然后其他时候它只会在中断之前进行一两次迭代。

I'm assuming I just don't quite understand how you're supposed to iterate over data frames properly.我假设我不太明白你应该如何正确地迭代数据帧。 If someone with more experience with this stuff could help me out that would be great.如果对这些东西有更多经验的人可以帮助我，那就太好了。

Answer 1

Try:尝试：

 Y_train[Y_train == 'Blue']=0.0
 Y_train[Y_train == 'Green']=1.0

That should solve your issues.那应该可以解决您的问题。

Answer 2

In cases where you even have more number of labels than your current example(Blue and Green in your case), sklearn provides a label encoder that allows you to do this very easily using如果您的标签数量甚至超过当前示例（在您的情况下为蓝色和绿色）， sklearn提供了一个标签编码器，允许您使用

from sklearn import preprocessing 

label_encoder = preprocessing.LabelEncoder() 

# Transforms the 'column' in your dataframe df
df['column']= label_encoder.fit_transform(df['column'])

Answer 3

If you are using a your own method to label encoding,it is better to create a separate encoded column rather than modifying original column.After that you can assign encoded column to your dataframe.如果您使用自己的方法来标记编码，最好创建一个单独的编码列而不是修改原始列。之后您可以将编码列分配给您的数据帧。 As a example for your scenario.作为您的场景的示例。

encoded = np.ones((Y_train.shape[0],1))
for i in range(Y_train.shape[0]):
        if Y_train[i] == 'Blue':
            encoded[i] = 0

Note that this will only work for if you have two categories.请注意，这仅适用于您有两个类别的情况。

for multiple categories,you can use sklearn or pandas methods.对于多个类别，您可以使用 sklearn 或 pandas 方法。

For multiple categories对于多个类别

Another approach is using pandas cat.codes .You can convert pandas series to a category and get the category codes.另一种方法是使用熊猫cat.codes 。您可以将熊猫系列转换为类别并获取类别代码。

Y_train = pd.Series(Y_train)
encoded = Y_train.astype("category").cat.codes

You can use sklearn Labelencoder to encode categorical data as well.您也可以使用sklearn Labelencoder对分类数据进行编码。

from sklearn.preprocessing import  LabelEncoder 
le = LabelEncoder()
encoded = le.fit_transform(Y_train)

'KeyError:' 迭代 Pandas 数据框时？

问题描述

3 个解决方案

解决方案1
1 已采纳 2019-12-10 00:30:30

解决方案2
1 2019-12-10 00:44:00

解决方案3
1 2019-12-10 01:57:28

&#39;KeyError:&#39; 迭代 Pandas 数据框时？

问题描述

3 个解决方案

解决方案1 1 已采纳 2019-12-10 00:30:30

解决方案2 1 2019-12-10 00:44:00

解决方案3 1 2019-12-10 01:57:28

'KeyError:' 迭代 Pandas 数据框时？

解决方案1
1 已采纳 2019-12-10 00:30:30

解决方案2
1 2019-12-10 00:44:00

解决方案3
1 2019-12-10 01:57:28