[英]ValueError: Input contains NaN, infinity or a value too large for dtype('float64') while preprocessing Data
[英]ValueError: Input contains NaN, infinity or a value too large for dtype('float64') when trying to normalize data
當我刪除所有NaN並以百分比表示數據時,為什么會出現此錯誤? 無法解決此問題,因為數據也不應過大或具有任何無窮大值。
import pandas as pd
from sklearn import preprocessing
import numpy as np
df['Close_mid'] = [752.69, 736.09, 746.39, 749.97, 761.68, 762.08, 768.05, 782.25, 784.65, 786.72, 770.59]
def remove_nan(DataFrame):
return DataFrame.dropna(inplace=True)
df['returns'] = 100 * df['Close_mid'].pct_change()
remove_nan(df)
x_array = np.array(df['returns'])
x_array= x_array.reshape(-1, 1)
normalized_X = preprocessing.normalize(x_array)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
----> 5 normalized_X = preprocessing.normalize(x_array)
當我使用鏈接到的數據文件並描述數據集時,我發現,
>>>print(df.describe())
Close_large Close_mid Close_small
count 3077.000000 3077.000000 3077.000000
mean 121.647894 432.353685 425.614839
std 30.558998 204.394909 201.607000
min 53.410000 0.000000 0.000000
25% 98.800000 273.090000 278.740000
50% 117.040000 330.580000 339.260000
75% 147.690000 598.490000 547.570000
max 179.370000 870.110000 929.140000
min
行當然是特別令人感興趣的,這表明在計算百分比變化時確實確實存在被零除的問題。 令人討厭的記錄如下,
>>>print(df[df['Close_mid'] == 0])
Close_large Close_mid Close_small
Date
2008-06-06 96.76 0.0 0.0
但是您可能首先應該確保日期列對齊良好(該日期本身在數據中不以這種形式存在,並且日期列也都沒有相同的名稱)。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.