scipy.sparse.csr_matrix行過濾-如何正確實現？

Question

我正在處理一些scipy.sparse.csr_matrixes。 老實說，我手頭的東西來自Scikit-learn的TfidfVectorizer：

vectorizer = TfidfVectorizer(min_df=0.0005)
textsMet2 = vectorizer.fit_transform(textsMet)

好的，這是一個矩陣：

textsMet2
<999x1632 sparse matrix of type '<class 'numpy.float64'>'
    with 5042 stored elements in Compressed Sparse Row format>

現在，我只想獲取那些具有任何非零元素的行。 所以很明顯我去簡單的索引：

 textsMet2[(textsMet2.sum(axis=1)>0),:]

並得到一個錯誤：

_boolean_index_to_array中的文件“ D：\\ Apps \\ Python \\ lib \\ site-packages \\ scipy \\ sparse \\ sputils.py”第327行，引發IndexError（'invalid index shape'）IndexError：invalid index shape

如果刪除索引的最后一部分，我會感到奇怪：

textsMet2[(textsMet2.sum(axis=1)>0)]
<1x492 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>

為什么只顯示1行矩陣？

再一次，我想獲取此矩陣中所有具有非零元素的所有行。 有人知道該怎么做嗎？

Answer 1

你需要把面具ravel 。 這是我目前正在處理的事情的一些代碼：

tr_matrix = pipeline.fit_transform(train_text, y_train, **fit_params)

    # remove documents with too few features
    to_keep_train = tr_matrix.sum(axis=1) >= config['min_train_features']
    to_keep_train = np.ravel(np.array(to_keep_train))
    logging.info('%d/%d train documents have enough features', 
                 sum(to_keep_train), len(y_train))
    tr_matrix = tr_matrix[to_keep_train, :]

這有點不雅致，但可以完成工作。

scipy.sparse.csr_matrix行過濾-如何正確實現？

問題描述

1 個解決方案

解決方案1
1 已采納 2015-04-21 16:39:42

scipy.sparse.csr_matrix行過濾-如何正確實現？

問題描述

1 個解決方案

解決方案1 1 已采納 2015-04-21 16:39:42

解決方案1
1 已采納 2015-04-21 16:39:42