[英]TypeError: Failed to convert object of type Sparsetensor to Tensor
I am building a text classification model for imdb sentiment analysis dataset.我正在为 imdb 情感分析数据集构建文本分类 model。 I downloaded the dataset and followed the tutorial given here - https://developers.google.com/machine-learning/guides/text-classification/step-4
我下载了数据集并按照此处给出的教程进行操作 - https://developers.google.com/machine-learning/guides/text-classification/step-4
The error I get is我得到的错误是
TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> to Tensor. Contents: SparseTensor(indices=Tensor("DeserializeSparse:0", shape=(None, 2), dtype=int64), values=Tensor("DeserializeSparse:1", shape=(None,), dtype=float32), dense_shape=Tensor("stack:0", shape=(2,), dtype=int64)). Consider casting elements to a supported type.
the type of x_train and x_val are scipy.sparse.csr.csr_matrix. x_train 和 x_val 的类型是 scipy.sparse.csr.csr_matrix。 This give an error when passed to sequential model. How to solve?
这在传递给顺序 model 时出错。如何解决?
import tensorflow as tf
import numpy as np
from tensorflow.python.keras.preprocessing import sequence
from tensorflow.python.keras.preprocessing import text
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
# Vectorization parameters
# Range (inclusive) of n-gram sizes for tokenizing text.
NGRAM_RANGE = (1, 2)
# Limit on the number of features. We use the top 20K features.
TOP_K = 20000
# Whether text should be split into word or character n-grams.
# One of 'word', 'char'.
TOKEN_MODE = 'word'
# Minimum document/corpus frequency below which a token will be discarded.
MIN_DOCUMENT_FREQUENCY = 2
# Limit on the length of text sequences. Sequences longer than this
# will be truncated.
MAX_SEQUENCE_LENGTH = 500
def ngram_vectorize(train_texts, train_labels, val_texts):
"""Vectorizes texts as ngram vectors.
1 text = 1 tf-idf vector the length of vocabulary of uni-grams + bi-grams.
# Arguments
train_texts: list, training text strings.
train_labels: np.ndarray, training labels.
val_texts: list, validation text strings.
# Returns
x_train, x_val: vectorized training and validation texts
"""
# Create keyword arguments to pass to the 'tf-idf' vectorizer.
kwargs = {
'ngram_range': NGRAM_RANGE, # Use 1-grams + 2-grams.
'dtype': 'int32',
'strip_accents': 'unicode',
'decode_error': 'replace',
'analyzer': TOKEN_MODE, # Split text into word tokens.
'min_df': MIN_DOCUMENT_FREQUENCY,
}
vectorizer = TfidfVectorizer(**kwargs)
# Learn vocabulary from training texts and vectorize training texts.
x_train = vectorizer.fit_transform(train_texts)
# Vectorize validation texts.
x_val = vectorizer.transform(val_texts)
# Select top 'k' of the vectorized features.
selector = SelectKBest(f_classif, k=min(TOP_K, x_train.shape[1]))
selector.fit(x_train, train_labels)
x_train = selector.transform(x_train)
x_val = selector.transform(x_val)
x_train = x_train.astype('float32')
x_val = x_val.astype('float32')
return x_train, x_val
I also got the error message我也收到错误信息
TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> [...]
when I built a model based on the Google Machine Learning Guide for text classification .当我根据Google 机器学习指南为文本分类构建模型时。
Calling todense()
on vectorized training and validation texts worked for me:在矢量化训练和验证文本上调用
todense()
对我todense()
:
x_train = vectorizer.fit_transform(train_texts).todense()
x_val = vectorizer.transform(val_texts).todense()
(It seems to be very slow, though, I had to limit the training samples.) (虽然看起来很慢,但我不得不限制训练样本。)
EDIT:编辑:
When I remove this line (instead of adding .todense()
), it also seems to work:当我删除这一行(而不是添加
.todense()
)时,它似乎也有效:
model.add(Dropout(rate=dropout_rate, input_shape=x_train.shape[1:]))
For more details see this discussion: https://github.com/tensorflow/tensorflow/issues/47931有关更多详细信息,请参阅此讨论: https : //github.com/tensorflow/tensorflow/issues/47931
What I have done to solve this issue was convert the SpareTensors to arrays by calling .toarray()
我为解决此问题所做的工作是通过调用
.toarray()
将 SpareTensors 转换为 arrays
So in your code, at the very end of your ngram_vectorize()
function add:因此,在您的代码中,在
ngram_vectorize()
function 的最后添加:
return x_train.toarray(), x_val.toarray()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.