[英]Error when merging two 2D arrays Zero-dimensional arrays cannot be concatenated
I am working on a binary text classification task and I've applied the vectorizer on my data as follows: 我正在执行二进制文本分类任务,并且已将矢量化器应用于数据,如下所示:
count_vect = CountVectorizer(tokenizer=tokens)
X_train_counts = count_vect.fit_transform(docs_train.data)
print X_train_counts.shape
(150, 370)
and because I want to take only a random sample from class '0' (a in my example)and classify it with class '1', I did the following: 并且因为我只想从类'0'(在我的示例中为a)中随机抽取一个样本并将其分类为类'1',所以我做了以下工作:
x = X_train_counts
y = docs_train.target
a_x,a_y=x[y==0,:],y[y==0]
b_x,b_y=x[y==1,:],y[y==1]
inds=np.random.choice(range(a_x.shape[0]),50)
random_x=a_x[inds,:]
random_y=a_y[inds]
x_merged=np.concatenate((random_x,b_x))
y_merged=np.concatenate((random_y,b_y))
X_train,y_train=shuffle(x_merged, y_merged, random_state=0)
but I always getting the following error: 但我总是收到以下错误:
x_merged=np.concatenate((random_x,b_x))
ValueError: zero-dimensional arrays cannot be concatenated
although when I print the shape it gaves me: 虽然当我打印形状时它给了我:
print random_x.shape
print b_x.shape
(50, 370)
(50, 370)
any idea how to fix it ? 知道如何解决吗? with of course preserving the indexes as it links to the labels.
当然,在链接到标签时会保留索引。
Update: This is a print of the content/type of each arrays when the following commands executed: 更新:这是执行以下命令时每个阵列的内容/类型的打印:
print random_x[:5,:].toarray()
print b_x[:5,:].toarray()
print (type(random_x))
print (type(b_x))
[[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[4 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]]
[[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]]
<class 'scipy.sparse.csr.csr_matrix'>
<class 'scipy.sparse.csr.csr_matrix'>
EDIT: Apparently Scipy has it's own concatenate methods including hstack and vstack which deal with sparse matrices. 编辑:显然Scipy有它自己的连接方法,包括处理稀疏矩阵的hstack和vstack 。
The problem is indeed the type. 问题确实是类型。 To solve it just convert your csr_matrix into an array, concatenate, and than convert it again to a csr_matrix:
要解决该问题,只需将csr_matrix转换为数组,然后串联,然后再将其转换为csr_matrix:
import numpy as np
import scipy.sparse as m
a = np.zeros((50, 370))
b = np.zeros((50, 370))
am = m.csr_matrix(a).toarray()
bm = m.csr_matrix(b).toarray()
cm = m.csr_matrix(np.concatenate((am,bm)))
print(am.shape,bm.shape,cm.shape)
The result is: 结果是:
(50, 370) (50, 370) (100, 370)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.