How remove few columns from countvectorized sparse dataframe in pandas

Question

I have around 2000 text features inside countvectorized data frame. I have list of 800 text feature columns which have actual feature importance contribution for prediction model. I want keep only this 800 columns and remove rest 1200 columnns as they do not contribute much towards my prediction.

How can I do that. I have the list of columns to be maintained in text file.

cv = CountVectorizer( max_features = 2000,analyzer='word') 
    cv_text = cv.fit_transform(data.pop('text'))
    for i, col in enumerate(cv.get_feature_names()):
        data[col] = pd.SparseSeries(cv_text[:, i].toarray().ravel(), fill_value=0)

Answer 1

It should be easy:

data = data.drop(list_of_cols_to_drop, axis=1)

or

data = data.drop(data.columns.difference(list_of_needed_cols), axis=1)

there is a drop method for SparseDataFrame objects.

From the docstring:

In [139]: pd.SparseDataFrame.drop?
Signature: pd.SparseDataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='rai
se')
Docstring:
Return new object with labels in requested axis removed.

How remove few columns from countvectorized sparse dataframe in pandas

Question

1 answers

solution1
0 2017-12-11 15:51:20

How remove few columns from countvectorized sparse dataframe in pandas

Question

1 answers

solution1 0 2017-12-11 15:51:20

solution1
0 2017-12-11 15:51:20