簡體   English   中英

Python sklearn PCA.explained_variance_ratio_不等於1

[英]Python sklearn PCA.explained_variance_ratio_ doesn't sum to 1

我正在使用sklearn的PCA模塊。 我正在使用以下代碼來設置分析。

from sklearn.decomposition import PCA
pca = PCA(n_components=9)
p = pca.fit([row[:-1] for row in norm])

這里的norm是我的規范化數據集,在最后一列中有一個唯一的標識符,這就是為什么我在最后一行中將其刪除。 此數據集中有9個功能部件,因此我希望有9個組件不會有無法解釋的方差。 但是,當我調用p.explained_variance_.cumsum()時,我得到:

[ 0.06589563  0.08608778  0.09578116  0.10150195  0.10703567  0.11036608
  0.11241904  0.11422285  0.11591605]

我是否對PCA有誤解? 之前我都沒用過這個模塊,但是已經有一段時間了。 我設置錯了嗎? 我剝離了所有識別信息的數據以發布到此處。 以下是似乎正在復制問題的數據子集。

[0.3888888888888889, 0.3888888888888889, 0.3888888888888889, 0.436943311456892, 0.7905900031193156, 0.5020468092219706, 0.8389717734280283, 0.7604923090797432, 0.8206054422776056, '0']
[0.3888888888888889, 0.3888888888888889, 0.2222222222222222, 0.4457200178477334, 0.8114779465247448, 0.506899600792241, 0.8368566485573798, 0.760617288778523, 0.8195489478905984, '1']
[0.2777777777777778, 0.2777777777777778, 0.05555555555555555, 0.4426231291814084, 0.7883413226205706, 0.5037172133121759, 0.8370362549229062, 0.7599752704033258, 0.8184218722901648, '2']
[0.1111111111111111, 0.1111111111111111, 0.16666666666666666, 0.4651807845446571, 0.7983379003654792, 0.5250604537887904, 0.8463875215362144, 0.7533582308429306, 0.8241548325954007, '3']
[0.5000000000000001, 0.5000000000000001, 0.3333333333333333, 0.4457200178477334, 0.7878040593905666, 0.506899600792241, 0.8368566485573798, 0.7605016058324149, 0.8195489478905984, '4']
[0.3888888888888889, 0.3888888888888889, 0.2222222222222222, 0.44943322185630036, 0.7843622888520198, 0.5055757644148106, 0.8351253941103399, 0.7604171267769607, 0.8185442945328569, '5']
[0.3888888888888889, 0.3888888888888889, 0.3333333333333333, 0.4424914587425397, 0.7877430312713435, 0.5029950110274568, 0.836692391332608, 0.760611529525946, 0.8198150075184326, '6']
[0.3333333333333333, 0.05555555555555555, 0.7777777777777778, 0.4389415113841421, 0.7878040593905666, 0.506899600792241, 0.8368566485573798, 0.7605016058324149, 0.8195489478905984, '7']
[0.4444444444444444, 0.4444444444444444, 0.4444444444444444, 0.42770705188736874, 0.7976039510596705, 0.5057230657076256, 0.8368566485573798, 0.7605016058324149, 0.8195489478905984, '8']
[0.2222222222222222, 0.2777777777777778, 0.5000000000000001, 0.43182322765312314, 0.7971732873351607, 0.5072390458086798, 0.84541364942531, 0.7613416598875292, 0.8239037851005895, '9']

這是虹膜數據集包括碎石圖的示例:

在此處輸入圖片說明


嘗試使用剛剛發布的數據集重現您的問題:

d = matrix([[0.3888888888888889, 0.3888888888888889, 0.3888888888888889, 0.436943311456892, 0.7905900031193156, 0.5020468092219706, 0.8389717734280283, 0.7604923090797432, 0.8206054422776056, '0'],
[0.3888888888888889, 0.3888888888888889, 0.2222222222222222, 0.4457200178477334, 0.8114779465247448, 0.506899600792241, 0.8368566485573798, 0.760617288778523, 0.8195489478905984, '1'],
[0.2777777777777778, 0.2777777777777778, 0.05555555555555555, 0.4426231291814084, 0.7883413226205706, 0.5037172133121759, 0.8370362549229062, 0.7599752704033258, 0.8184218722901648, '2'],
[0.1111111111111111, 0.1111111111111111, 0.16666666666666666, 0.4651807845446571, 0.7983379003654792, 0.5250604537887904, 0.8463875215362144, 0.7533582308429306, 0.8241548325954007, '3'],
[0.5000000000000001, 0.5000000000000001, 0.3333333333333333, 0.4457200178477334, 0.7878040593905666, 0.506899600792241, 0.8368566485573798, 0.7605016058324149, 0.8195489478905984, '4'],
[0.3888888888888889, 0.3888888888888889, 0.2222222222222222, 0.44943322185630036, 0.7843622888520198, 0.5055757644148106, 0.8351253941103399, 0.7604171267769607, 0.8185442945328569, '5'],
[0.3888888888888889, 0.3888888888888889, 0.3333333333333333, 0.4424914587425397, 0.7877430312713435, 0.5029950110274568, 0.836692391332608, 0.760611529525946, 0.8198150075184326, '6'],
[0.3333333333333333, 0.05555555555555555, 0.7777777777777778, 0.4389415113841421, 0.7878040593905666, 0.506899600792241, 0.8368566485573798, 0.7605016058324149, 0.8195489478905984, '7'],
[0.4444444444444444, 0.4444444444444444, 0.4444444444444444, 0.42770705188736874, 0.7976039510596705, 0.5057230657076256, 0.8368566485573798, 0.7605016058324149, 0.8195489478905984, '8'],
[0.2222222222222222, 0.2777777777777778, 0.5000000000000001, 0.43182322765312314, 0.7971732873351607, 0.5072390458086798, 0.84541364942531, 0.7613416598875292, 0.8239037851005895, '9']])

在此處輸入圖片說明

最后,在CV堆棧交換上的變形蟲給了我答案-這是一個簡單的錯字。 我調用了p.explained_variance_.cumsum()但正確的方法調用是p.explained_variance_ratio_.cumsum() 當然,差異不一定要加一!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM