简体   繁体   English

对numpy蒙板数组的操作会屏蔽无效值

[英]Operations on numpy masked array gives invalid values masked

From the documentation on masked arrays in numpy operations on numpy arrays : 从numpy数组中numpy 操作中的掩码数组文档:

The numpy.ma module comes with a specific implementation of most ufuncs. numpy.ma模块附带了大多数ufunc的特定实现。 Unary and binary functions that have a validity domain (such as log or divide) return the masked constant whenever the input is masked or falls outside the validity domain: eg: 只要输入被屏蔽或落在有效域之外,具有有效域(例如log或divide)的一元和二元函数就会返回屏蔽常量:例如:

ma.log([-1, 0, 1, 2])
masked_array(data = [-- -- 0.0 0.69314718056],
             mask = [ True  True False False],
       fill_value = 1e+20)

I have the problem that for my calculations I need to know where those invalid operations were produced. 我有一个问题,就我的计算而言,我需要知道这些无效操作的产生位置。 Concretely I would like this instead: 具体而言,我希望这样:

ma.log([-1, 0, 1, 2])
masked_array(data = [np.nan -- 0.0 0.69314718056],
             mask = [ True  True False False],
       fill_value = 1e+20)

At the risk of this question being conversational my main question is: 冒这个问题是会话的风险我的主要问题是:

What is a good solution to get this masked_array where the computed invalid values (those "fixed" by fix_invalid , like np.nan and np.inf) are not turned into (and conflated with) masked values? 获得这个masked_array的好方法是什么,计算出的无效值(由fix_invalid “修复”,如np.nan和np.inf)不会变成(并与屏蔽值混淆)?

My current solution would be to compute the function on the masked_array.data and then reconstruct the masked array with the original mask. 我目前的解决方案是计算masked_array.data上的masked_array.data ,然后用原始掩码重建掩码数组。 However, I am writing an application which maps arbitrary functions from the user onto many different arrays, some of which are masked and some aren't, and I am looking to avoid a special handler just for masked arrays. 但是,我正在编写一个应用程序,它将用户的任意函数映射到许多不同的数组,其中一些被屏蔽而另一些则没有,我希望避免一个特殊的处理程序,仅用于屏蔽数组。 Furthermore, these arrays have a distinction between MISSING, NaN, and Inf that is important so I can't just use an array with np.nan s instead of masked values. 此外,这些数组在MISSING,NaN和Inf之间有区别,这很重要,所以我不能只使用带有np.nan s的数组而不是masked值。


Additionally, if anyone has any perspective on why this behavior exists I would like to know. 此外,如果有人对这种行为存在的原因有任何看法,我想知道。 It seems strange to have this in the same operation because the validity of results of an operation on unmasked values are really the responsibility of the user, who can choose to then "clean up" by using the fix_invalid function. 在同一操作中使用它似乎很奇怪,因为对未屏蔽值的操作结果的有效性实际上是用户的责任,用户可以选择使用fix_invalid函数“清理”。

Furthermore, if anyone knows anything about the progress of missing values in numpy please share as the oldest posts are from 2011-2012 where there was a debate that never resulted in anything. 此外,如果有人知道关于numpy中缺失值的进展的任何信息,请分享,因为最早的帖子是从2011年至2012年,那里的辩论从未产生任何结果。


EDIT: 2017-10-30 编辑:2017-10-30

To add to hpaulj's answer; 添加到hpaulj的答案; the definition of the log function with a modified domain has side effects on the behavior of the log in the numpy namespace. 使用修改的域定义日志函数会对numpy命名空间中的日志行为产生副作用。

In [1]: import numpy as np

In [2]: np.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
  #!/home/salotz/anaconda3/bin/python
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: invalid value encountered in log
  #!/home/salotz/anaconda3/bin/python
Out[2]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)

In [3]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)

In [4]: np.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
  #!/home/salotz/anaconda3/bin/python
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: invalid value encountered in log
  #!/home/salotz/anaconda3/bin/python
Out[4]: 
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)

np.log now has the same behavior as mylog , but np.ma.log is unchanged: np.log目前拥有相同的行为mylog ,但np.ma.log是不变的:

In [5]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[5]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)

Is there a way to avoid this? 有办法避免这种情况吗?

Using Python 3.6.2 :: Anaconda custom (64-bit) and numpy 1.12.1 使用Python 3.6.2 :: Anaconda custom (64-bit)和numpy 1.12.1

Just clarify what appears to be going on here 只是澄清一下这里似乎发生了什么

np.ma.log runs np.log on the argument, but it traps the Warnings: np.ma.log在参数上运行np.log ,但它会捕获警告:

In [26]: np.log([-1,0,1,2])
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in log
  #!/usr/bin/python3
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in log
  #!/usr/bin/python3
Out[26]: array([        nan,        -inf,  0.        ,  0.69314718])

It masks the nan and -inf values. 它掩盖了nan-inf值。 And apparently copies the original values into these data slots: 并且显然将原始值复制到这些data槽中:

In [27]: np.ma.log([-1,0,1,2])
Out[27]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [28]: _.data
Out[28]: array([-1.        ,  0.        ,  0.        ,  0.69314718])

(running in Py3; numpy version 1.13.1) (在Py3中运行; numpy版本1.13.1)

This masking behavior is not unique to ma.log . 此屏蔽行为不是ma.log It is determined by its class 这是由它的班级决定的

In [41]: type(np.ma.log)
Out[41]: numpy.ma.core._MaskedUnaryOperation

In np.ma.core it is defined with fill and domain attributes: np.ma.core它使用filldomain属性定义:

log = _MaskedUnaryOperation(umath.log, 1.0,
                        _DomainGreater(0.0))

So the valid domain (unmasked) is >0: 所以有效域(未屏蔽)> 0:

In [47]: np.ma.log.domain([-1,0,1,2])
Out[47]: array([ True,  True, False, False], dtype=bool)

that domain mask is or-ed with 该域掩码是or-ed with

In [54]: ~np.isfinite(np.log([-1,0,1,2]))
...
Out[54]: array([ True,  True, False, False], dtype=bool)

which has the same values. 它具有相同的值。

Looks like I could define a custom log that does not add its own domain masking: 看起来我可以定义一个不添加自己的域掩码的自定义log

In [58]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)
In [59]: mylog([-1,0,1,2])
Out[59]: 
masked_array(data = [        nan        -inf  0.          0.69314718],
             mask = False,
       fill_value = 1e+20)

In [63]: np.ma.masked_array([-1,0,1,2],[1,0,0,0])
Out[63]: 
masked_array(data = [-- 0 1 2],
             mask = [ True False False False],
       fill_value = 999999)
In [64]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[64]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [65]: mylog(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[65]: 
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM