[英]Different size of array after fit_transform
I have a problem with fit_transform
function. 我有fit_transform
函数的问题。 Can someone explain why size of array different? 有人可以解释为什么数组的大小不同?
In [5]: X.shape, test.shape
Out[5]: ((1000, 1932), (1000, 1932))
In [6]: from sklearn.feature_selection import VarianceThreshold
sel = VarianceThreshold(threshold=(.8 * (1 - .8)))
features = sel.fit_transform(X)
features_test = sel.fit_transform(test)
In [7]: features.shape, features_test.shape
Out[7]:((1000, 1663), (1000, 1665))
UPD: Which transformation can help me get arrays with same sizes? UPD:哪种转换可以帮助我获得相同大小的数组?
It is because you are fitting your selector twice . 这是因为你适合你的选择器两次 。
First, note that fit_transform
is just a call to fit
followed by a call to transform
. 首先,请注意fit_transform
只是一个fit
调用,然后是transform
调用。
The fit
method allows your VarianceThreshold
selector to find the features it wants to keep in the dataset based on the parameters you gave it. fit
方法允许您的VarianceThreshold
选择器根据您给出的参数查找要保留在数据集中的要素。
The transform
method performs the actual feature selection and returns an array with just the selected features. transform
方法执行实际的特征选择,并返回仅包含所选特征的数组。
Because fit_transform
applies a dimensionality reduction on the array. 因为fit_transform
对数组应用了fit_transform
维。 This is why the resulting arrays dimensions are not the same as the input. 这就是生成的数组维度与输入不同的原因。
See this what is the difference between 'transform' and 'fit_transform' in sklearn and this http://scikit-learn.org/stable/modules/feature_extraction.html 看看sklearn中的'transform'和'fit_transform'之间的区别是什么? http: //scikit-learn.org/stable/modules/feature_extraction.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.