自定義 function 來計算平均絕對偏差

Question

我有一個類似於此的 4D numpy 數組：

>>>import numpy as np
>>>from functools import partial

>>>X = np.random.rand(20, 1, 10, 4)

>>>X.shape
(20, 1, 10, 4)

我計算以下統計數據mean, median, std, p25, p75

>>>percentiles = tuple(partial(np.percentile, q=q) for q in (25,75))
>>>stat_functions = (np.mean, np.std, np.median) + percentiles

>>>stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

以便：

>>>stats.shape
(20, 1, 5, 4)

>>>stats[0]
array([[[0.55187202, 0.55892688, 0.45816177, 0.6378181 ],
        [0.31028278, 0.32109677, 0.17319351, 0.13341651],
        [0.57112019, 0.60587194, 0.45490572, 0.59787335],
        [0.30857011, 0.30367621, 0.28899686, 0.55742753],
        [0.80678815, 0.82014851, 0.61295181, 0.70529412]]])

我對統計數據中的mad感興趣，所以我定義了這個 function，因為它不適用於 numpy。

def mad(data):
    mean = np.mean(data)
    f = lambda x: abs(x - mean)
    vf = np.vectorize(f)
    return (np.add.reduce(vf(data))) / len(data)

但是我在讓這個 function 工作時遇到問題：首先我嘗試了：

>>>stat_functions = (np.mean, np.std, np.median, mad) + percentiles
>>>stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-33-fa6d972f0fce> in <module>()
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

<ipython-input-33-fa6d972f0fce> in <listcomp>(.0)
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

TypeError: mad() got an unexpected keyword argument 'axis'

然后我將mad的定義修改為：

def mad(data, axis=None):
    ...

進入這個問題：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-c74d9e3d057b> in <module>()
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

<ipython-input-35-c74d9e3d057b> in <listcomp>(.0)
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

TypeError: mad() got an unexpected keyword argument 'keepdims'

所以也這樣做：

def mad(data, axis=None, keepdims=None):
    ...

讓我陷入：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-c74d9e3d057b> in <module>()
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 4 dimension(s) and the array at index 3 has 3 dimension(s)

我知道這與維度問題有關，但我不確定在這種情況下如何解決它。

*編輯：

根據給出的答案，在使用mad的 function 答案后，我得到了一個奇怪的結果，如下所示：

stat_functions = (np.mean, np.std, np.median,mad) + percentiles

stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

stats.shape
(20, 1, 15, 4)

預期的 output 應該具有(20,1,6,4)的形狀，因為我在第三維中添加了一個統計值： (np.mean, np.std, np.median, mad) + percentiles

編輯-2

從答案中使用這個 function：

def mad(data, axis=-1, keepdims=True):
    return np.abs(data - data.mean(axis, keepdims=True)).mean(axis)

接着：

stat_functions = (np.mean, np.std, np.median, mad) + percentiles

stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

然后遇到這個：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-fa6d972f0fce> in <module>()
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 4 dimension(s) and the array at index 3 has 3 dimension(s)

Answer 1

我注意到您的代碼vf的第一件事絕不是矢量化 function（請參閱Numpy 文檔中的注釋。您可以只使用np.abs而不是abs並且您的 function 將被矢量化。

也就是說，您的 function 可以寫成：

def mad(data):
    return np.abs(data - data.mean(0))/ len(data)

現在，請注意這個mad的 function，或者你原來的 function，只接受一個位置參數，沒有可選參數。 你得到的錯誤是因為你試圖將axis=2傳遞給mad ：

[func(X, axis=2, keepdims=True) for func in stat_functions]

要解決此問題，請使用可選參數構建 function：

def mad(data, axis=-1, keepdims=True):
    return np.abs(data - data.mean(axis, keepdims=keepdims)).sum(axis)/len(data)

或者使用mean(axis)比sum(axis)/len(data)更有意義

def mad(data, axis=-1, keepdims=True):
    return np.abs(data - data.mean(axis, keepdims=True)).mean(axis)

自定義 function 來計算平均絕對偏差

問題描述

1 個解決方案

解決方案1
0 2020-09-21 15:39:04

自定義 function 來計算平均絕對偏差

問題描述

1 個解決方案

解決方案1 0 2020-09-21 15:39:04

解決方案1
0 2020-09-21 15:39:04