簡體   English   中英

自定義 function 來計算平均絕對偏差

[英]Custom function to compute mean absolute deviation

我有一個類似於此的 4D numpy 數組:

>>>import numpy as np
>>>from functools import partial

>>>X = np.random.rand(20, 1, 10, 4)

>>>X.shape
(20, 1, 10, 4)

我計算以下統計數據mean, median, std, p25, p75

>>>percentiles = tuple(partial(np.percentile, q=q) for q in (25,75))
>>>stat_functions = (np.mean, np.std, np.median) + percentiles

>>>stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

以便:

>>>stats.shape
(20, 1, 5, 4)

>>>stats[0]
array([[[0.55187202, 0.55892688, 0.45816177, 0.6378181 ],
        [0.31028278, 0.32109677, 0.17319351, 0.13341651],
        [0.57112019, 0.60587194, 0.45490572, 0.59787335],
        [0.30857011, 0.30367621, 0.28899686, 0.55742753],
        [0.80678815, 0.82014851, 0.61295181, 0.70529412]]])

我對統計數據中的mad感興趣,所以我定義了這個 function,因為它不適用於 numpy。

def mad(data):
    mean = np.mean(data)
    f = lambda x: abs(x - mean)
    vf = np.vectorize(f)
    return (np.add.reduce(vf(data))) / len(data)

但是我在讓這個 function 工作時遇到問題:首先我嘗試了:

>>>stat_functions = (np.mean, np.std, np.median, mad) + percentiles
>>>stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-33-fa6d972f0fce> in <module>()
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

<ipython-input-33-fa6d972f0fce> in <listcomp>(.0)
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

TypeError: mad() got an unexpected keyword argument 'axis'

然后我將mad的定義修改為:

def mad(data, axis=None):
    ...

進入這個問題:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-c74d9e3d057b> in <module>()
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

<ipython-input-35-c74d9e3d057b> in <listcomp>(.0)
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

TypeError: mad() got an unexpected keyword argument 'keepdims'

所以也這樣做:

def mad(data, axis=None, keepdims=None):
    ...

讓我陷入:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-c74d9e3d057b> in <module>()
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 4 dimension(s) and the array at index 3 has 3 dimension(s)

我知道這與維度問題有關,但我不確定在這種情況下如何解決它。

*編輯:

根據給出的答案,在使用mad的 function 答案后,我得到了一個奇怪的結果,如下所示:

stat_functions = (np.mean, np.std, np.median,mad) + percentiles

stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

stats.shape
(20, 1, 15, 4)

預期的 output 應該具有(20,1,6,4)的形狀,因為我在第三維中添加了一個統計值: (np.mean, np.std, np.median, mad) + percentiles

編輯-2

從答案中使用這個 function:

def mad(data, axis=-1, keepdims=True):
    return np.abs(data - data.mean(axis, keepdims=True)).mean(axis)

接着:

stat_functions = (np.mean, np.std, np.median, mad) + percentiles

stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

然后遇到這個:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-fa6d972f0fce> in <module>()
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 4 dimension(s) and the array at index 3 has 3 dimension(s)

我注意到您的代碼vf的第一件事絕不是矢量化 function(請參閱Numpy 文檔中的注釋。您可以只使用np.abs而不是abs並且您的 function 將被矢量化。

也就是說,您的 function 可以寫成:

def mad(data):
    return np.abs(data - data.mean(0))/ len(data)

現在,請注意這個mad的 function,或者你原來的 function,只接受一個位置參數,沒有可選參數。 你得到的錯誤是因為你試圖將axis=2傳遞給mad

[func(X, axis=2, keepdims=True) for func in stat_functions]

要解決此問題,請使用可選參數構建 function:

def mad(data, axis=-1, keepdims=True):
    return np.abs(data - data.mean(axis, keepdims=keepdims)).sum(axis)/len(data)

或者使用mean(axis)sum(axis)/len(data)更有意義

def mad(data, axis=-1, keepdims=True):
    return np.abs(data - data.mean(axis, keepdims=True)).mean(axis)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM