[英]Does CountVectorizer().fit_transform() preserve order of input?
I'm wondering if, when I use CountVectorizer().fit_transform()
, the output preserves the order of the input.我想知道,当我使用
CountVectorizer().fit_transform()
时,output 是否保留了输入的顺序。
My input is a list of documents.我的输入是一份文件清单。 I know that the output matches the input in terms of the length, but I'm not sure if they are ordered the same way.
我知道 output 在长度方面与输入匹配,但我不确定它们的排序方式是否相同。
I understand that I might not be explaining it very well, so here's an example.我知道我可能没有很好地解释它,所以这里有一个例子。
Say if I have:说如果我有:
input = ["<text_1>", "<text_2>", "<text_3>"]
a = CountVectorizer().fit_transform(input)
Will the indexes correspond as though order is preserved?索引是否会像保留顺序一样对应?
For example, in:例如,在:
(0, 33) 1
...
(0, 42) 8
...
(385, 58) 1
(385, 51) 6
Is (0, 33) 1
eqivalent to input[0]
, or (385, 58) 1
to input[365]
? (0, 33) 1
是否等同于input[0]
或(385, 58) 1
1 等同于input[365]
?
Yes, the row order is preserved.是的,行顺序被保留。 This must be true for all scikit-learn transformation methods, because a common workflow is to split your data into a feature matrix
X
and a target vector y
, where each row of the matrix corresponds to one element of the vector.这必须适用于所有 scikit-learn 转换方法,因为常见的工作流程是将数据拆分为特征矩阵
X
和目标向量y
,其中矩阵的每一行对应于向量的一个元素。 When you transform X
, you must still be able to train the model on the transformed X
paired with y
, so the order must be preserved.当您转换
X
时,您必须仍然能够在转换后的X
上与y
配对训练 model,因此必须保留顺序。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.