TypeError：无法将 object 类型的 Sparsetensor 转换为 Tensor

Question

I am building a text classification model for imdb sentiment analysis dataset.我正在为 imdb 情感分析数据集构建文本分类 model。 I downloaded the dataset and followed the tutorial given here - https://developers.google.com/machine-learning/guides/text-classification/step-4我下载了数据集并按照此处给出的教程进行操作 - https://developers.google.com/machine-learning/guides/text-classification/step-4

The error I get is我得到的错误是

TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> to Tensor. Contents: SparseTensor(indices=Tensor("DeserializeSparse:0", shape=(None, 2), dtype=int64), values=Tensor("DeserializeSparse:1", shape=(None,), dtype=float32), dense_shape=Tensor("stack:0", shape=(2,), dtype=int64)). Consider casting elements to a supported type.

the type of x_train and x_val are scipy.sparse.csr.csr_matrix. x_train 和 x_val 的类型是 scipy.sparse.csr.csr_matrix。 This give an error when passed to sequential model. How to solve?这在传递给顺序 model 时出错。如何解决？

import tensorflow as tf
import numpy as np

from tensorflow.python.keras.preprocessing import sequence
from tensorflow.python.keras.preprocessing import text
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif

# Vectorization parameters

# Range (inclusive) of n-gram sizes for tokenizing text.
NGRAM_RANGE = (1, 2)

# Limit on the number of features. We use the top 20K features.
TOP_K = 20000

# Whether text should be split into word or character n-grams.
# One of 'word', 'char'.
TOKEN_MODE = 'word'

# Minimum document/corpus frequency below which a token will be discarded.
MIN_DOCUMENT_FREQUENCY = 2

# Limit on the length of text sequences. Sequences longer than this
# will be truncated.
MAX_SEQUENCE_LENGTH = 500


def ngram_vectorize(train_texts, train_labels, val_texts):
    """Vectorizes texts as ngram vectors.
    1 text = 1 tf-idf vector the length of vocabulary of uni-grams + bi-grams.
    # Arguments
        train_texts: list, training text strings.
        train_labels: np.ndarray, training labels.
        val_texts: list, validation text strings.
    # Returns
        x_train, x_val: vectorized training and validation texts
    """
    # Create keyword arguments to pass to the 'tf-idf' vectorizer.
    kwargs = {
            'ngram_range': NGRAM_RANGE,  # Use 1-grams + 2-grams.
            'dtype': 'int32',
            'strip_accents': 'unicode',
            'decode_error': 'replace',
            'analyzer': TOKEN_MODE,  # Split text into word tokens.
            'min_df': MIN_DOCUMENT_FREQUENCY,
    }
    vectorizer = TfidfVectorizer(**kwargs)

    # Learn vocabulary from training texts and vectorize training texts.
    x_train = vectorizer.fit_transform(train_texts)

    # Vectorize validation texts.
    x_val = vectorizer.transform(val_texts)

    # Select top 'k' of the vectorized features.
    selector = SelectKBest(f_classif, k=min(TOP_K, x_train.shape[1]))
    selector.fit(x_train, train_labels)
    x_train = selector.transform(x_train)
    x_val = selector.transform(x_val)

    x_train = x_train.astype('float32')
    x_val = x_val.astype('float32')
    return x_train, x_val

Answer 1

I also got the error message我也收到错误信息

TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> [...]

when I built a model based on the Google Machine Learning Guide for text classification .当我根据Google 机器学习指南为文本分类构建模型时。

Calling todense() on vectorized training and validation texts worked for me:在矢量化训练和验证文本上调用todense()对我todense() ：

x_train = vectorizer.fit_transform(train_texts).todense()
x_val = vectorizer.transform(val_texts).todense()

(It seems to be very slow, though, I had to limit the training samples.) （虽然看起来很慢，但我不得不限制训练样本。）

EDIT:编辑：

When I remove this line (instead of adding .todense() ), it also seems to work:当我删除这一行（而不是添加.todense() ）时，它似乎也有效：

model.add(Dropout(rate=dropout_rate, input_shape=x_train.shape[1:]))

For more details see this discussion: https://github.com/tensorflow/tensorflow/issues/47931有关更多详细信息，请参阅此讨论： https : //github.com/tensorflow/tensorflow/issues/47931

Answer 2

There's a similar open issue that you can find here .您可以在此处找到一个类似的未解决问题。

Solution proposed is use Tensorflow version 2.1.0 and Keras version 2.3.1.建议的解决方案是使用 Tensorflow 2.1.0 版和 Keras 2.3.1 版。

Answer 3

What I have done to solve this issue was convert the SpareTensors to arrays by calling .toarray()我为解决此问题所做的工作是通过调用.toarray()将 SpareTensors 转换为 arrays

So in your code, at the very end of your ngram_vectorize() function add:因此，在您的代码中，在ngram_vectorize() function 的最后添加：

return x_train.toarray(), x_val.toarray()

TypeError：无法将 object 类型的 Sparsetensor 转换为 Tensor

问题描述

3 个解决方案

解决方案1
2 2021-03-03 10:26:29

解决方案2
0 已采纳 2020-10-12 12:43:05

解决方案3
0 2022-03-19 08:54:31

TypeError：无法将 object 类型的 Sparsetensor 转换为 Tensor

问题描述

3 个解决方案

解决方案1 2 2021-03-03 10:26:29

解决方案2 0 已采纳 2020-10-12 12:43:05

解决方案3 0 2022-03-19 08:54:31

解决方案1
2 2021-03-03 10:26:29

解决方案2
0 已采纳 2020-10-12 12:43:05

解决方案3
0 2022-03-19 08:54:31