简体   繁体   English

Python 中是否有等效于 R 的 apply 函数?

[英]Is there an equivalent to R apply function in Python?

I am trying to find the Python equivalent to R's apply function but with multidimensional arrays.我试图找到与 R 的apply函数等效但具有多维数组的 Python。

For example, when called the following code:例如,当调用以下代码时:

z <- array(1, dim = 2:4)
apply(z, 1, sum)

The result is:结果是:

[1] 12 12

and when called with two values for margin:当使用两个保证金值调用时:

apply(z, c(1,2), sum)

The result is:结果是:

     [,1] [,2] [,3]
[1,]    4    4    4
[2,]    4    4    4

I found that the sum function in numpy can be used, but not in the same consistent way:我发现可以使用 numpy 中的sum函数,但不是以相同的一致方式:

For example:例如:

import numpy as np

xx= np.ones((2,3,4))
np.sum(xx,axis=(1,2))

The result is:结果是:

array([12., 12.])

but I can't find a function that equivalent to apply in its manner specifically when dealing with margin=c(1,2) .但在处理margin=c(1,2)时,我找不到一个等效于以它的方式apply的函数。 Could anyone help?有人可以帮忙吗?

The equivalent in NumPy is: NumPy中的等效项是:

xx.sum(axis=2)

That is, you are summing over axis 2 (the last dimension), which as its length is 4, leaves the other two dimensions (2,3) as the shape of the result: 也就是说,您要对轴2(最后一个尺寸)求和,其长度为4,剩下的两个尺寸(2,3)作为结果的形状:

array([[4., 4., 4.],
       [4., 4., 4.]])

Perhaps a more literal translation of your R code would be: 您的R代码的更直接的翻译可能是:

np.apply_over_axes(np.sum, xx, 2)

Which gives a similar result but transposed. 给出相似的结果,但转置。 This is likely to be slower, however, and is not idiomatic unless the actual operation you're performing is something more complicated than sum. 但是,这可能会比较慢,并且不是惯用语言,除非您要执行的实际操作比总和要复杂得多。

np.apply_over_axes is different from R's apply in several ways. np.apply_over_axes在几个方面与 R 的apply不同。

First, np.apply_over_axes needs collapsing axes to be specified, whereas R's apply needs remaining axes to be specified.首先, np.apply_over_axes需要指定折叠轴,而 R 的apply需要指定剩余的轴。

Secondly, np.apply_over_axes applies function iteratively as the documentation stated below.其次, np.apply_over_axes反复应用功能的文档中另有说明。 The result is the same for np.sum but it could be different for other functions. np.sum的结果相同,但其他函数的结果可能不同。

func is called as res = func(a, axis), where axis is the first element of axes. func 被称为 res = func(a,axis),其中轴是轴的第一个元素。 The result res of the function call must have either the same dimensions as a or one less dimension.函数调用的结果 res 必须与 a 具有相同的维度或少一个维度。 If res has one less dimension than a, a dimension is inserted before axis.如果 res 比 a 少一个维度,则在轴之前插入一个维度。 The call to func is then repeated for each axis in axes, with res as the first argument.然后对轴中的每个轴重复调用 func,将 res 作为第一个参数。

And the func for np.apply_over_axes needs to be in particular format and the return of func needs to be in particular shape for np.apply_over_axes to perform correctly.并且np.apply_over_axes的 func 需要采用特定格式,并且 func 的返回需要采用特定形状,以便np.apply_over_axes正确执行。

Here's an example how np.apply_over_axes fails这是np.apply_over_axes如何失败的示例

>>> arr.shape
(5, 4, 3, 2)
>>> np.apply_over_axes(np.mean, arr, (0,1))
array([[[[ 0.05856732, -0.14844212],
         [ 0.34214183,  0.24319846],
         [-0.04807454,  0.04752829]]]])
>>> np_mean = lambda x: np.mean(x)
>>> np.apply_over_axes(np_mean, arr, (0,1))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in apply_over_axes
  File "/Users/kwhkim/opt/miniconda3/envs/rtopython2-pip/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 495, in apply_over_axes
    res = func(*args)
TypeError: <lambda>() takes 1 positional argument but 2 were given

Since there seems to be no equivalent function in Python, I made a function that is similar to R's apply由于Python中好像没有等价的函数,所以我做了一个类似于R的apply的函数

def np_apply(arr, axes_remain, fun, *args, **kwargs):
    axes_remain = tuple(set(axes_remain))
    arr_shape = arr.shape
    axes_to_move = set(range(len(arr.shape)))
    for axis in axes_remain:
        axes_to_move.remove(axis)
    axes_to_move = tuple(axes_to_move)
    arr, axes_to_move
    arr2 = np.moveaxis(arr, axes_to_move, [-x for x in list(range(1,len(axes_to_move)+1))]).copy()
    #if arr2.flags.c_contiguous:
    arr2 = arr2.reshape([arr_shape[x] for x in axes_remain]+[-1])

    return np.apply_along_axis(fun, -1, arr2, *args, **kwargs)

It works fine at least for the sample example as above(not exactly the same as the result above but math.close() returns True for nearly all elements)它至少对于上面的示例示例工作正常(与上面的结果不完全相同,但math.close()对几乎所有元素都返回 True)

>>> np_apply(arr, (2,3), np.mean)
array([[ 0.05856732, -0.14844212],
       [ 0.34214183,  0.24319846],
       [-0.04807454,  0.04752829]])
>>> np_apply(arr, (2,3), np_mean)
array([[ 0.05856732, -0.14844212],
       [ 0.34214183,  0.24319846],
       [-0.04807454,  0.04752829]])

For the function to work smoothly for large multidimensional array, it needs to be optimized.为了使函数在大型多维数组中顺利工作,需要对其进行优化。 For instance, array should be prevented from copying.例如,应该防止数组复制。

Anyway it seems to work as a proof-of-concept and I hope it helps.无论如何,它似乎可以作为概念验证,我希望它有所帮助。

PS) arr is generated by arr = np.random.normal(0,1,(5,4,3,2)) PS) arrarr = np.random.normal(0,1,(5,4,3,2))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM