簡體   English   中英

ValueError:嘗試規范化數據時,輸入包含NaN,無窮大或對於dtype('float64')而言太大的值

[英]ValueError: Input contains NaN, infinity or a value too large for dtype('float64') when trying to normalize data

當我刪除所有NaN並以百分比表示數據時,為什么會出現此錯誤? 無法解決此問題,因為數據也不應過大或具有任何無窮大值。

import pandas as pd
from sklearn import preprocessing
import numpy as np

df['Close_mid'] = [752.69, 736.09, 746.39, 749.97, 761.68, 762.08, 768.05, 782.25, 784.65, 786.72, 770.59]

def remove_nan(DataFrame):
    return DataFrame.dropna(inplace=True) 

df['returns'] = 100 * df['Close_mid'].pct_change()
remove_nan(df)
x_array = np.array(df['returns'])
x_array= x_array.reshape(-1, 1)
normalized_X = preprocessing.normalize(x_array)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
----> 5 normalized_X = preprocessing.normalize(x_array)

當我使用鏈接到的數據文件並描述數據集時,我發現,

>>>print(df.describe())
       Close_large    Close_mid  Close_small
count  3077.000000  3077.000000  3077.000000
mean    121.647894   432.353685   425.614839
std      30.558998   204.394909   201.607000
min      53.410000     0.000000     0.000000
25%      98.800000   273.090000   278.740000
50%     117.040000   330.580000   339.260000
75%     147.690000   598.490000   547.570000
max     179.370000   870.110000   929.140000

min行當然是特別令人感興趣的,這表明在計算百分比變化時確實確實存在被零除的問題。 令人討厭的記錄如下,

>>>print(df[df['Close_mid'] == 0])
            Close_large  Close_mid  Close_small
Date                                           
2008-06-06        96.76        0.0          0.0

但是您可能首先應該確保日期列對齊良好(該日期本身在數據中不以這種形式存在,並且日期列也都沒有相同的名稱)。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM