简体   繁体   中英

Pandas: Concatenating DataFrame with Sparse Matrix

I'm doing some basic machine learning and have a sparse matrix resulting from TFIDF as follows:

<983x33599 sparse matrix of type '<type 'numpy.float64'>'
    with 232944 stored elements in Compressed Sparse Row format>

Then I have a DataFrame with a title column. I want to combine these into one DataFrame but when I try to use concat , I get that I can't combine a DataFrame with a non-DataFrame object.

How do I get around this?

Thanks!

Consider the following demo:

Source DF:

In [2]: df
Out[2]:
                     text
0       is it  good movie
1  wooow is it very goode
2               bad movie

Solution: let's create a SparseDataFrame out of TFIDF sparse matrix:

from sklearn.feature_extraction.text import TfidfVectorizer

vect = TfidfVectorizer(sublinear_tf=True, max_df=0.5, analyzer='word', stop_words='english')

sdf = pd.SparseDataFrame(vect.fit_transform(df['text']),
                         columns=vect.get_feature_names(), 
                         default_fill_value=0)
sdf['text'] = df['text']

Result:

In [13]: sdf
Out[13]:
   bad  good     goode     wooow                    text
0  0.0   1.0  0.000000  0.000000       is it  good movie
1  0.0   0.0  0.707107  0.707107  wooow is it very goode
2  1.0   0.0  0.000000  0.000000               bad movie

In [14]: sdf.memory_usage()
Out[14]:
Index    80
bad       8
good      8
goode     8
wooow     8
text     24
dtype: int64

PS pay attention at .memory_usage() - we didn't lose the "spareness". If we would use pd.concat , join , merge , etc. - we would lose the "sparseness" as all these methods generate a new regular (not sparsed) copy of merged DataFrames

Maybe you can try using to_dense() on the sparse matrix before doing the concatenation, and later convert back to a sparse matrix with to_sparse() . Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM