简体   繁体   中英

Python Pandas extract column from dataframe and delete

I'm having a lot of trouble with this, and the compiler keeps giving me totally unhelpful, rubbish commentary.

Here is the code so far:

# extract continuous features into a separate variable
continuous_feats = data['age', 'fnlwgt', 'educational-num', 'capital-gain',
                    'capital-loss', 'hours-per-week']

# normalize the continuous features and turn them
# into numpy arrays
for feature in continuous_feats:
    continuous_feats[feature] = (continuous_feats[feature] -  continuous_feats[feature].mean())/continuous_feats[feature].std()

This is what I'm trying to do:

I have an object called "data" which is of type DataFrame. It contains some columns I want to extract, whose headings are listed in continuous_feats, then I want to normalize them (which I'm currently doing in the loop), and finally I want to convert them into a numpy array. I don't want to make copies of anything. The object "data" should not contain any of these columns.

If there is a faster alternative I'm all ears. But no matter what I try, I just get a bunch of garbage:

Traceback (most recent call last):
 File "/usr/localnfs/Compiler/python/Anaconda3-2019.07/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2657, in get_loc
 return self._engine.get_loc(key)
 File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
 File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
 File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
 File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('age', 'fnlwgt', 'educational-num', 'capital-gain', 'capital-loss', 'hours-per-week')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "main.py", line 97, in <module>
   continuous_feats = data['age', 'fnlwgt', 'educational-num', 'capital-gain',
 File "/usr/localnfs/Compiler/python/Anaconda3-2019.07/lib/python3.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
   indexer = self.columns.get_loc(key)
 File "/usr/localnfs/Compiler/python/Anaconda3-2019.07/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
   return self._engine.get_loc(self._maybe_cast_indexer(key))
 File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
 File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
 File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
 File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in   pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('age', 'fnlwgt', 'educational-num', 'capital-gain', 'capital-loss', 'hours-per-week')

If you want to extract a subset of columns from a dataframe, you need to pass the column names as a list. The order of the columns in the resulting dataframe will be the same as the order in the list.

continuous_feats = data[['age', 'fnlwgt', 'educational-num', 'capital-gain',
                    'capital-loss', 'hours-per-week']]

More details and options for slicing and dicing dataframes can be found here .

There should be double square brackets when you are extracting list of columns from a dataframe. Please try Below code.

continuous_feats = data[['age', 'fnlwgt', 'educational-num', 'capital-gain', 'capital-loss', 'hours-per-week']]

continuous_feats = data[['age', 'fnlwgt', 'educational-num', 'capital-gain',
                'capital-loss', 'hours-per-week']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM