简体   繁体   English

选择 n 个分量后应用 PCA 约简

[英]Apply the PCA reduction after choosing n components

I have a data set and I want to reduce dimensionality with PCA method.我有一个数据集,我想用 PCA 方法降低维度。
I import the data set, split it to train and validation sets and apply PCA to see what is the cumulative variance sum.我导入数据集,将其拆分为训练集和验证集,然后应用 PCA 来查看累积方差和。

df = read_csv("train.csv")
X = df.drop(['label'], axis = 1)
y = df['label']
X_train, X_validation, y_train, y_validation = train_test_split(X, y, test_size = 0.2, random_state = 42)
model = PCA()
model.fit_transform(X_train_scaled)
variance = model.explained_variance_
cum_var = np.cumsum(variance)/np.sum(variance)

plt.figure(figsize = (15,15))
plt.bar(range(1,26), cum_var*100, alpha = 0.5, align = 'center', label = 'cummulative variance')
plt.legend()
plt.ylabel('Variance')
plt.xlabel('Principal components')

for x,y in zip(range(1,26),cum_var):
    label = "{:.2f}".format(y)
    plt.annotate(label, # this is the text
             (x,y), # this is the point to label
             textcoords = "offset points", # how to position the text
             xytext = (0,10), # distance from text to points (x,y)
             ha = 'center')
plt.show()

I choose n_components = 16.我选择 n_components = 16。
How can I apply the n_components = 16 on X_train and X_validation?如何在 X_train 和 X_validation 上应用 n_components = 16?

Once you know the appropriate number of components, you could just run PCA again, setting n_components=16 , and just keep the new reduced features as your X_train :一旦您知道适当数量的组件,您可以再次运行PCA ,设置n_components=16 ,并将新的减少功能保留为您的X_train

model = PCA(n_components=16)
model.fit(X_train)
X_train = model.transform(X_train)
X_validation = model.transform(X_validation)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM