[英]How can I implement my PCA results to my unlabelled data?
我正在努力實施我成功的 PCA。
這就是我的 PCA 圖的樣子:
我從我觀察到的加速度計數據(x,y,z)中檢索到這個數據,並用 A、S 和 D 標記。
我可以在互聯網上找到很多關於如何執行 PCA 的信息,但現在我想將它應用於我的新數據。 而且我找不到任何有關此的信息,或者我做錯了。
這是我的代碼:
import pandas as pd
import os
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
os.chdir(r'C:\Users\##\OneDrive - ##\##\Pyth\data\runFlume1')
os.getcwd()
## read csv
df = pd.read_csv('dataframe_0.csv', delimiter=',', names = ['x','y','z','gradient_x','gradient_y','gradient_z','target'])
features = ['x', 'y', 'z']
# Separating out the features
x = df.loc[:, features].values
# Separating out the target
y = df.loc[:,['target']].values
# Standardizing the features
x = StandardScaler().fit_transform(x)
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2'])
finalDf = pd.concat([principalDf, df[['target']]], axis = 1)
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1)
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 component PCA', fontsize = 20)
targets = ['A','S','D']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = finalDf['target'] == target
ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1']
, finalDf.loc[indicesToKeep, 'principal component 2']
, c = color
, s = 50)
ax.legend(targets)
ax.grid()
我的原始數據框如下所示:
x y z gradient_x gradient_y gradient_z target
0 -0.875 -0.143 0.516 0.0310 0.0000 0.032 A
1 -0.844 -0.143 0.548 0.0155 0.0000 0.000 A
2 -0.844 -0.143 0.516 0.0000 0.0000 0.000 A
3 -0.844 -0.143 0.548 0.0000 0.0000 0.016 A
4 -0.844 -0.143 0.548 0.0000 0.0000 0.016 A
... ... ... ... ... ... ...
17947 0.969 -0.079 0.161 0.0000 0.0475 0.016 D
17948 1.000 -0.079 0.161 0.0000 0.0000 0.000 D
17949 0.969 -0.079 0.161 0.0155 0.0000 0.000 D
17950 0.969 -0.079 0.161 0.0000 0.0000 0.000 D
17951 0.969 -0.079 0.161 0.0000 0.0000 0.000 D
所以我想在沒有標簽(A,D,S)的數據上使用這個 PCA。 有誰知道我該怎么做?
親切的問候,
西蒙
您可以簡單地獲取您的pca
對象並transform
未標記數據的特征。 就像是:
unlabelled_df = pd.read_csv('dataframe_unlabeled.csv',
delimiter=',', names = ['x','y','z','gradient_x','gradient_y','gradient_z'])
features = ['x', 'y', 'z']
# Separating out the features
x = df.loc[:, features].values
# Standardizing the features
# You need to retain your previous scaler and only `transform` here to avoid leakage
x = scaler.transform(x)
principalComponents = pca.transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2'])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.