I'm having a lot of trouble with this, and the compiler keeps giving me totally unhelpful, rubbish commentary.
Here is the code so far:
# extract continuous features into a separate variable
continuous_feats = data['age', 'fnlwgt', 'educational-num', 'capital-gain',
'capital-loss', 'hours-per-week']
# normalize the continuous features and turn them
# into numpy arrays
for feature in continuous_feats:
continuous_feats[feature] = (continuous_feats[feature] - continuous_feats[feature].mean())/continuous_feats[feature].std()
This is what I'm trying to do:
I have an object called "data" which is of type DataFrame. It contains some columns I want to extract, whose headings are listed in continuous_feats, then I want to normalize them (which I'm currently doing in the loop), and finally I want to convert them into a numpy array. I don't want to make copies of anything. The object "data" should not contain any of these columns.
If there is a faster alternative I'm all ears. But no matter what I try, I just get a bunch of garbage:
Traceback (most recent call last):
File "/usr/localnfs/Compiler/python/Anaconda3-2019.07/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2657, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('age', 'fnlwgt', 'educational-num', 'capital-gain', 'capital-loss', 'hours-per-week')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 97, in <module>
continuous_feats = data['age', 'fnlwgt', 'educational-num', 'capital-gain',
File "/usr/localnfs/Compiler/python/Anaconda3-2019.07/lib/python3.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "/usr/localnfs/Compiler/python/Anaconda3-2019.07/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('age', 'fnlwgt', 'educational-num', 'capital-gain', 'capital-loss', 'hours-per-week')
If you want to extract a subset of columns from a dataframe, you need to pass the column names as a list. The order of the columns in the resulting dataframe will be the same as the order in the list.
continuous_feats = data[['age', 'fnlwgt', 'educational-num', 'capital-gain',
'capital-loss', 'hours-per-week']]
More details and options for slicing and dicing dataframes can be found here .
There should be double square brackets when you are extracting list of columns from a dataframe. Please try Below code.
continuous_feats = data[['age', 'fnlwgt', 'educational-num', 'capital-gain', 'capital-loss', 'hours-per-week']]
continuous_feats = data[['age', 'fnlwgt', 'educational-num', 'capital-gain',
'capital-loss', 'hours-per-week']]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.