[英]preprocessing.MinMaxScaler and preprocessing.normalize return dataframe of Nulls
I have dataframe with floats as data, and I'd like to normalize the data, so first I convert it into int (otherwise I have error ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
) my code for normalizing: 我有一个以浮点数作为数据的数据框,我想对数据进行规范化,因此首先将其转换为int(否则将出现错误
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
)我的规范化代码:
def normalize_df():
x = my_df.values.astype(int)
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pd.DataFrame(x_scaled)
return df
And my output is 我的输出是
0 1 2 3 4 5 6 7 8 9 ... 12 13 14 15 16 17 18 19 20 21
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
What's happening (assuming that my initial dataframe contains values 0
in some rows but less than 30% of dataframe)? 发生了什么情况(假设我的初始数据框在某些行中包含值
0
,但小于数据框的30%)? how can I fix this bug and avoid the output with zeros? 如何解决此错误并避免输出为零?
EDITED EDITED
my data looks like (there much more columns and rows): 我的数据看起来像(有更多的列和行):
36680 0 22498037 2266
0 2218 22502676 0
26141 0 22505885 4533
39009 0 22520711 4600
36237 0 22527171 5933
And I tried to have the values to be from 0.0 to 1.0 我尝试将值设置为0.0到1.0
It's not a bug, it's happening because you are trying to convert NaN
values into integers, look how it works (on my machine): 这不是错误,它的发生是因为您正尝试将
NaN
值转换为整数,并查看其工作原理(在我的机器上):
In [132]: a
Out[132]: array([ nan, 1., nan])
In [133]: a.astype(int)
Out[133]: array([-9223372036854775808, 1, -9223372036854775808])
So each NaN
is pretty small value comparing to another integers in your dataset, this causes incorrect scaling. 因此,与数据集中的另一个整数相比,每个
NaN
都是一个很小的值,这会导致缩放错误。
To fix this problem you should work with floats. 要解决此问题,您应该使用浮点数。 Before scaling you need to get rid of of
NaN
's with some imputation, or remove such incomplete samples at all. 在缩放之前,您需要通过一些插补来消除
NaN
,或者完全删除不完整的样本。 Look at sklearn.preprocessing.Imputer . 查看sklearn.preprocessing.Imputer 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.