简体   繁体   中英

Error while trying to impute missing values using Panda's 'loc' functionin Python

I am trying to impute missing values in one of the columns in my dataset using the 'loc' function of Panda library but the code is not executing successfully. The line of code is as below.

# Impute missing data by mean weight of each sub-category in 'Item_Weight' column

data.loc[miss_bool,'Item_Weight'] = data.loc[miss_bool,'Item_Identifier'].apply(lambda x: item_avg_weight[x])

The error that is being generated is as follows,

data.loc[miss_bool,'Item_Weight'] = data.loc[miss_bool,'Item_Identifier'].apply(lambda x: item_avg_weight[x])
Traceback (most recent call last):

  File "<ipython-input-3-168be6231060>", line 1, in <module>
    data.loc[miss_bool,'Item_Weight'] = data.loc[miss_bool,'Item_Identifier'].apply(lambda x: item_avg_weight[x])

  File "C:\Users\Arnab\Anaconda3\lib\site-packages\pandas\core\series.py", line 3192, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)

  File "pandas/_libs/src\inference.pyx", line 1472, in pandas._libs.lib.map_infer

  File "<ipython-input-3-168be6231060>", line 1, in <lambda>
    data.loc[miss_bool,'Item_Weight'] = data.loc[miss_bool,'Item_Identifier'].apply(lambda x: item_avg_weight[x])

  File "C:\Users\Arnab\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2685, in __getitem__
    return self._getitem_column(key)

  File "C:\Users\Arnab\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2692, in _getitem_column
    return self._get_item_cache(key)

  File "C:\Users\Arnab\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2486, in _get_item_cache
    values = self._data.get(item)

  File "C:\Users\Arnab\Anaconda3\lib\site-packages\pandas\core\internals.py", line 4115, in get
    loc = self.items.get_loc(item)

  File "C:\Users\Arnab\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3065, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'FDP10'

I observed that the last line of the error trace says 'KeyError : 'FDP10' .

FDP10 is precisely the first value from the 'Item_Identifier' column for which the corresponding cell in the 'Item_Weight' column is blank (aka has a missing value).

So, appears that the first blank column this code hits, it fails to replace that blank column with the replacement value.

An alternative code I found is,

data.loc[miss_bool,'Item_Weight'] = data.loc[miss_bool,'Item_Identifier'].apply(lambda x: item_avg_weight.at[x,'Item_Weight'])

The logic behind this alternative code also appears sound to me. But my question is what is wrong with the original code ?

Let me know if you need any additional info, I will provide the same!

分析时,发现训练数据集中的 4 个产品有一行缺失值,因此可能的解决方案是参考包含 4 个缺失产品的项目权重的测试数据集。

this has gotta do with the use of pivot_table

Go for :

item_avg_weight = df.groupby('Item_Identifier').mean()['Item_Weight']

Instead of :

item_avg_weight = df.pivot_table(values='Item_Weight', index='Item_Identifier')

So,that u would not have to use the at function and then simply write:

df.loc[miss_bool,'Item_Weight'] = df.loc[miss_bool,'Item_Identifier'].apply(lambda x: item_avg_weight[x])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM