简体   繁体   English

在python中如何替换稀疏csr_matrix中的nan

[英]In python how to replace nan in sparse csr_matrix

I have hstacked a sprase matrix and a dataframe . 我已经堆叠了一个sprase矩阵和一个dataframe。 The resulting csr_matrix is containing NAN. 结果csr_matrix包含NAN。

My question is how to update these nan values to 0 . 我的问题是如何将这些nan值更新为0。

X_train_1hc = sp.sparse.hstack([X_train_1hc, X_train_df.values]).tocsr()

When I pass X_train_1hc to a clasifier I get error Input contains NaN or infinity or a value too large for dtype('float') 当我将X_train_1hc传递给分类器时,出现错误输入包含NaN或无穷大,或者对于dtype('float')而言值太大

1.Is there an option/function/hack to replace nan values in a sparse matrix. 1.是否有一个选项/功能/技巧来替换稀疏矩阵中的nan值。 This is a conceptual question and hence no data is being provided. 这是一个概念性问题,因此没有提供任何数据。

Expanding a bit on Martin's answer, here is one way to do it. 扩展一下马丁的答案,这是一种方法。 Assume you have a csr_matrix with some NaN values: 假设您有一个带有某些NaN值的csr_matrix

>>> Asp.todense()
matrix([[0.37512508,        nan, 0.34919696, 0.10321203],
        [0.48744859, 0.07289436, 0.16881342, 0.57637166],
        [0.37742037, 0.01425494, 0.38536847, 0.23799655],
        [0.95520474, 0.97719059,        nan, 0.22877082]])

Since the csr_matrix stores the nonzeros in the data attribute , you need to manipulate that array. 由于csr_matrix将非零csr_matrix存储在data属性中 ,因此您需要操作该数组。 The replacing all occurences of NaN and inf by 0 and some large number (in fact the largest one representable), you can do 您可以将NaNinf的所有出现替换为0和一个较大的数字(实际上是最大的可表示的数字),

>>> Asp.data = np.nan_to_num(Asp.data, copy=False)
>>> Asp.todense()
matrix([[0.37512508, 0.        , 0.34919696, 0.10321203],
        [0.48744859, 0.07289436, 0.16881342, 0.57637166],
        [0.37742037, 0.01425494, 0.38536847, 0.23799655],
        [0.95520474, 0.97719059, 0.        , 0.22877082]])

Alternatively, you can replace just NaN 's manually like this: 另外,您可以像这样手动替换NaN

>>> Asp.data[np.isnan(Asp.data)] = 0.0
>>> Asp.todense()
matrix([[0.37512508, 0.        , 0.34919696, 0.10321203],
        [0.48744859, 0.07289436, 0.16881342, 0.57637166],
        [0.37742037, 0.01425494, 0.38536847, 0.23799655],
        [0.95520474, 0.97719059, 0.        , 0.22877082]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM