简体   繁体   English

为什么我可以调用从原始数据到 plot 的 PCA numpy.ndarray 的密钥?

[英]Why can I call upon a key from the original data to a plot for a PCA numpy.ndarray?

Hi I have a theoretical question from a code that works fine.嗨,我有一个理论上的问题,来自一个运行良好的代码。

I am running a PCA to the load_breast_cancer dataset from sklearn.我正在对 sklearn 的 load_breast_cancer 数据集运行 PCA。 After running the PCA I plot the data based on the first two principal components and I know I can color the points of data by a key from the original load_breast_cancer dataset, namely ''target".运行 PCA 后,我 plot 基于前两个主要成分的数据,我知道我可以通过原始 load_breast_cancer 数据集中的一个键为数据点着色,即“目标”。

The code I am particularly concerned is when I plot and I write "c=cancer['target']".我特别关心的代码是当我 plot 和我写“c=cancer['target']”时。 How does the 'target' column is retained through all of the PCA and scalling specially since the x_pca is a numpy.ndarray with shape (569, 2)?由于 x_pca 是形状为 (569, 2) 的 numpy.ndarray,如何通过所有 PCA 和缩放保留“目标”列?

Code below:代码如下:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline

#importing dataset
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
df = pd.DataFrame(cancer['data'],columns=cancer['feature_names'])

#scalling
scaler = StandardScaler()
scaler.fit(df)
scaled_data = scaler.transform(df)

# PCA
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(scaled_data)
x_pca = pca.transform(scaled_data)

#Plotting
plt.figure(figsize=(8,6))
#Note that it is an array, not a dataframe so brackets refer to order
plt.scatter(x_pca[:,0],x_pca[:,1],c=cancer['target'],cmap='plasma')
plt.xlabel('First PC')
plt.ylabel('Second PC')

Thank you!谢谢!

It seems that you run df through a pipeline, and df does not include target as a column.似乎您通过管道运行df ,并且df不包含target作为列。 So it is not transformed in the process.所以它在这个过程中没有被转化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 numpy.ndarray 数据绘制图像 - Plot images from numpy.ndarray data 在3D条形图中以不同的数组长度绘制大型稀疏numpy.ndarray中的数据 - Plot data from large sparse numpy.ndarray in 3D bar plot with different array length 为什么我在制作 plot 时在 Python 中得到“不可散列的类型:‘numpy.ndarray’? - Why am I getting 'unhashable type: 'numpy.ndarray' in Python, when making a plot? TypeError:尝试执行PCA时-:'numpy.ndarray'和'numpy.ndarray'的不受支持的操作数类型 - TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray' whilst trying to do PCA 为什么使用numpy.ndarray.astype将numpy.ndarray转换为自定义数据类型会使我的数据倍增? - Why converting numpy.ndarray to custom data type using numpy.ndarray.astype multiplies my data? 如何从numpy.ndarray数据中排除行/列 - How to exclude rows/columns from numpy.ndarray data 如何将bump3中的numpy.ndarray作为元数据包含在内? - How can I include a numpy.ndarray as Metadata with boto3? 从numpy.ndarray打印数据不起作用 - printing data from numpy.ndarray does not work 为什么 AssertionError 有 numpy.ndarray - why AssertionError having numpy.ndarray 为什么它会自动转换为 numpy.ndarray? - Why it did convert to numpy.ndarray automaticly?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM