忽略nan的Python比較

Question

雖然nan == nan總是False ，但在許多情況下人們希望將它們視為平等，這在pandas.DataFrame.equals中pandas.DataFrame.equals ：

同一位置的 NaN 被認為是相等的。

當然，我可以寫

def equalp(x, y):
    return (x == y) or (math.isnan(x) and math.isnan(y))

但是，這將在非數字上的[float("nan")]和isnan barfs 等容器上失敗（因此復雜性增加）。

那么，人們如何比較可能包含nan復雜 Python 對象呢？

附注。 動機：當比較 pandas DataFrame兩行時，我會將它們轉換為dict並按元素比較 dicts 。

聚苯乙烯。 當我說“比較”時，我在想diff ，而不是equalp 。

Answer 1

假設您有一個帶有nan值的數據框：

In [10]: df = pd.DataFrame(np.random.randint(0, 20, (10, 10)).astype(float), columns=["c%d"%d for d in range(10)])

In [10]: df.where(np.random.randint(0,2, df.shape).astype(bool), np.nan, inplace=True)

In [10]: df
Out[10]:
     c0    c1    c2    c3    c4    c5    c6    c7   c8    c9
0   NaN   6.0  14.0   NaN   5.0   NaN   2.0  12.0  3.0   7.0
1   NaN   6.0   5.0  17.0   NaN   NaN  13.0   NaN  NaN   NaN
2   NaN  17.0   NaN   8.0   6.0   NaN   NaN  13.0  NaN   NaN
3   3.0   NaN   NaN  15.0   NaN   8.0   3.0   NaN  3.0   NaN
4   7.0   8.0   7.0   NaN   9.0  19.0   NaN   0.0  NaN  11.0
5   NaN   NaN  14.0   2.0   NaN   NaN   0.0   NaN  NaN   8.0
6   3.0  13.0   NaN   NaN   NaN   NaN   NaN  12.0  3.0   NaN
7  13.0  14.0   NaN   5.0  13.0   NaN  18.0   6.0  NaN   5.0
8   3.0   9.0  14.0  19.0  11.0   NaN   NaN   NaN  NaN   5.0
9   3.0  17.0   NaN   NaN   0.0   NaN  11.0   NaN  NaN   0.0

並且您想比較行，例如第 0 行和第 8 行。然后只需使用fillna並進行矢量化比較：

In [12]: df.iloc[0,:].fillna(0) != df.iloc[8,:].fillna(0)
Out[12]:
c0     True
c1     True
c2    False
c3     True
c4     True
c5    False
c6     True
c7     True
c8     True
c9     True
dtype: bool

如果您只想知道哪些列不同，您可以使用生成的布爾數組對列進行索引：

In [14]: df.columns[df.iloc[0,:].fillna(0) != df.iloc[8,:].fillna(0)]
Out[14]: Index(['c0', 'c1', 'c3', 'c4', 'c6', 'c7', 'c8', 'c9'], dtype='object')

Answer 2

我假設您有數組數據或至少可以轉換為 numpy 數組？

一種方法是使用numpy.ma數組屏蔽所有numpy.ma ，然后比較數組。 所以你的開始情況將是…… 像這樣

import numpy as np
import numpy.ma as ma
arr1 = ma.array([3,4,6,np.nan,2])
arr2 = ma.array([3,4,6,np.nan,2])

print arr1 == arr2
print ma.all(arr1==arr2)

>>> [ True  True  True False  True]
>>> False  # <-- you want this to show True

解決方法：

arr1[np.isnan(arr1)] = ma.masked
arr2[np.isnan(arr2)] = ma.masked

print arr1 == arr2
print ma.all(arr1==arr2)

>>> [True True True -- True]
>>> True

Answer 3

這是一個遞歸到數據結構中的函數，用唯一的字符串替換nan值。 我寫這個是為了一個單元測試，它比較可能包含nan數據結構。

它只是dict和list組成的數據結構設計的，但很容易看到如何擴展它。

from math import isnan
from uuid import uuid4
from typing import Union

NAN_REPLACEMENT = f"THIS_WAS_A_NAN{uuid4()}"

def replace_nans(data_structure: Union[dict, list]) -> Union[dict, list]:
    if isinstance(data_structure, dict):
        iterme = data_structure.items()
    elif isinstance(data_structure, list):
        iterme = enumerate(data_structure)
    else:
        raise ValueError(
            "replace_nans should only be called on structures made of dicts and lists"
        )

    for key, value in iterme:
        if isinstance(value, float) and isnan(value):
            data_structure[key] = NAN_REPLACEMENT
        elif isinstance(value, dict) or isinstance(value, list):
            data_structure[key] = replace_nans(data_structure[key])
    return data_structure

忽略nan的Python比較

問題描述

3 個解決方案

解決方案1
8 已采納 2018-01-26 03:05:27

解決方案2
2 2018-01-25 22:37:42

解決方案3
0 2021-02-04 16:14:28

忽略nan的Python比較

問題描述

3 個解決方案

解決方案1 8 已采納 2018-01-26 03:05:27

解決方案2 2 2018-01-25 22:37:42

解決方案3 0 2021-02-04 16:14:28

解決方案1
8 已采納 2018-01-26 03:05:27

解決方案2
2 2018-01-25 22:37:42

解決方案3
0 2021-02-04 16:14:28