CountVectorizer().fit_transform() 是否保留输入顺序？

Question

I'm wondering if, when I use CountVectorizer().fit_transform() , the output preserves the order of the input.我想知道，当我使用CountVectorizer().fit_transform()时，output 是否保留了输入的顺序。

My input is a list of documents.我的输入是一份文件清单。 I know that the output matches the input in terms of the length, but I'm not sure if they are ordered the same way.我知道 output 在长度方面与输入匹配，但我不确定它们的排序方式是否相同。

I understand that I might not be explaining it very well, so here's an example.我知道我可能没有很好地解释它，所以这里有一个例子。

Say if I have:说如果我有：

input = ["<text_1>", "<text_2>", "<text_3>"]
a = CountVectorizer().fit_transform(input)

Will the indexes correspond as though order is preserved?索引是否会像保留顺序一样对应？

For example, in:例如，在：

  (0, 33)   1
...
  (0, 42)   8
...
  (385, 58) 1
  (385, 51) 6

Is (0, 33) 1 eqivalent to input[0] , or (385, 58) 1 to input[365] ? (0, 33) 1是否等同于input[0]或(385, 58) 1 1 等同于input[365] ？

Answer 1

Yes, the row order is preserved.是的，行顺序被保留。 This must be true for all scikit-learn transformation methods, because a common workflow is to split your data into a feature matrix X and a target vector y , where each row of the matrix corresponds to one element of the vector.这必须适用于所有 scikit-learn 转换方法，因为常见的工作流程是将数据拆分为特征矩阵X和目标向量y ，其中矩阵的每一行对应于向量的一个元素。 When you transform X , you must still be able to train the model on the transformed X paired with y , so the order must be preserved.当您转换X时，您必须仍然能够在转换后的X上与y配对训练 model，因此必须保留顺序。

CountVectorizer().fit_transform() 是否保留输入顺序？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-03 12:08:56

CountVectorizer().fit_transform() 是否保留输入顺序？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-03 12:08:56

解决方案1
1 已采纳 2022-05-03 12:08:56