如何將我的 PCA 結果應用於未標記的數據？

Question

我正在努力實施我成功的 PCA。

這就是我的 PCA 圖的樣子：

我從我觀察到的加速度計數據（x，y，z）中檢索到這個數據，並用 A、S 和 D 標記。

我可以在互聯網上找到很多關於如何執行 PCA 的信息，但現在我想將它應用於我的新數據。 而且我找不到任何有關此的信息，或者我做錯了。

這是我的代碼：

import pandas as pd
import os
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

os.chdir(r'C:\Users\##\OneDrive - ##\##\Pyth\data\runFlume1')
os.getcwd()


## read csv
df = pd.read_csv('dataframe_0.csv', delimiter=',', names = ['x','y','z','gradient_x','gradient_y','gradient_z','target'])


features = ['x', 'y', 'z']

# Separating out the features
x = df.loc[:, features].values

# Separating out the target
y = df.loc[:,['target']].values

# Standardizing the features
x = StandardScaler().fit_transform(x)


pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])

finalDf = pd.concat([principalDf, df[['target']]], axis = 1)

fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1) 
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 component PCA', fontsize = 20)

targets = ['A','S','D']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
    indicesToKeep = finalDf['target'] == target
    ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1']
               , finalDf.loc[indicesToKeep, 'principal component 2']
               , c = color
               , s = 50)
ax.legend(targets)
ax.grid()

我的原始數據框如下所示：

       x      y      z  gradient_x  gradient_y  gradient_z target
0     -0.875 -0.143  0.516      0.0310      0.0000       0.032      A
1     -0.844 -0.143  0.548      0.0155      0.0000       0.000      A
2     -0.844 -0.143  0.516      0.0000      0.0000       0.000      A
3     -0.844 -0.143  0.548      0.0000      0.0000       0.016      A
4     -0.844 -0.143  0.548      0.0000      0.0000       0.016      A
     ...    ...    ...         ...         ...         ...    ...
17947  0.969 -0.079  0.161      0.0000      0.0475       0.016      D
17948  1.000 -0.079  0.161      0.0000      0.0000       0.000      D
17949  0.969 -0.079  0.161      0.0155      0.0000       0.000      D
17950  0.969 -0.079  0.161      0.0000      0.0000       0.000      D
17951  0.969 -0.079  0.161      0.0000      0.0000       0.000      D

所以我想在沒有標簽（A，D，S）的數據上使用這個 PCA。 有誰知道我該怎么做？

親切的問候，

西蒙

Answer 1

您可以簡單地獲取您的pca對象並transform未標記數據的特征。 就像是：

unlabelled_df = pd.read_csv('dataframe_unlabeled.csv', 
                delimiter=',', names = ['x','y','z','gradient_x','gradient_y','gradient_z'])


features = ['x', 'y', 'z']

# Separating out the features
x = df.loc[:, features].values

# Standardizing the features
# You need to retain your previous scaler and only `transform` here to avoid leakage
x = scaler.transform(x)  



principalComponents = pca.transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])

如何將我的 PCA 結果應用於未標記的數據？

問題描述

1 個解決方案

解決方案1
0 2022-06-07 19:34:56

如何將我的 PCA 結果應用於未標記的數據？

問題描述

1 個解決方案

解決方案1 0 2022-06-07 19:34:56

解決方案1
0 2022-06-07 19:34:56