繁体   English   中英

Python-NumPy中多维(3D)数组的条件和运算

[英]Conditioning and operation on multidimensional (3D) array in Python-NumPy

我有一个3-D数组(在z下方),例如表示时间上连续的2D数组(在a1a2下方)。 我想为所有这些2D数组沿其轴选择一些值(两个参考轴(以下为xy ),然后对结果序列“较小”进行一些操作(例如,均值,总和...) 2D阵列。

下面的代码提出了几种方法。 我发现solution1非常不雅致,但是它的执行速度似乎比solution2快。 为什么会这样呢?有没有更好的方法(更简洁,更有效(速度和内存))呢?

关于第2步,哪个是最好的选择,还有其他更有效的选择吗?为什么计算C2不起作用? 谢谢! [灵感来源: 在numpy中获取3D数组的2D切片的均值 ]

import numpy
import time

# Control parameters (to be modified to make different tests)
xx=1000
yy=6000

# Some 2D arrays, z is a 3D array containing a succesion of such arrays (2 here)
a1=numpy.arange(xx*yy).reshape((yy, xx))
a2=numpy.linspace(0,100, num=xx*yy).reshape((yy, xx)) 
z=numpy.array((a1, a2))

# Axes x and y along which conditioning for the 2D arrays is made
x=numpy.arange(xx)
y=numpy.arange(yy) 

# Condition is on x and y, to be applied on a1 and a2 simultaneously
xmin, xmax = xx*0.4, xx*0.8
ymin, ymax = yy*0.2, yy*0.5
xcond = numpy.logical_and(x>=xmin, x<=xmax)
ycond = numpy.logical_and(y>=ymin, y<=ymax)


def solution1():
    xcond2D = numpy.tile(xcond, (yy, 1))
    ycond2D = numpy.tile(ycond[numpy.newaxis].transpose(), (1, xx))
    xymask = numpy.logical_not(numpy.logical_and(xcond2D, ycond2D))
    xymaskzdim = numpy.tile(xymask, (z.shape[0], 1, 1))
    return numpy.ma.MaskedArray(z, xymaskzdim)

def solution2():
    return z[:,:,xcond][:,ycond, :]

start=time.clock()
z1=solution1()
end=time.clock()
print "Solution1: %s sec" % (end-start)
start=time.clock()
z2=solution2()
end=time.clock()
print "Solution2: %s sec" % (end-start)

# Step 2
# Now compute some calculation on the resulting z1 or z2
print "A1: ", z2.reshape(z2.shape[0], z2.shape[1]*z2.shape[2]).mean(axis=1)
print "A2: ", z1.reshape(z1.shape[0], z1.shape[1]*z1.shape[2]).mean(axis=1)
print "B1: ", z2.mean(axis=2).mean(axis=1)
print "B2: ", z1.mean(axis=2).mean(axis=1)
print "Numpy version: ", numpy.version.version
print "C1: ", z2.mean(axis=(1, 2))
print "C2: ", z1.mean(axis=(1, 2))

输出:

Solution1: 0.0568935728474 sec
Solution2: 0.157177904729 sec
A1:  [  2.10060000e+06   3.50100058e+01]
A2:  [2100600.0 35.01000583500077]
B1:  [  2.10060000e+06   3.50100058e+01]
B2:  [2100600.0 35.010005835000975]
Numpy version:  1.7.1
C1:  [  2.10060000e+06   3.50100058e+01]
C2: 
    TypeError: tuple indices must be integers, not tuple

可以通过切换选择顺序来提高速度:

def solution3():
    return z[:,ycond, :][...,xcond]

N = 10
print timeit.timeit("solution1()", setup="from __main__ import solution1, solution2, solution3, z, xcond, ycond, xx, yy", number=N)
print timeit.timeit("solution2()", setup="from __main__ import solution1, solution2, solution3, z, xcond, ycond, xx, yy", number=N)
print timeit.timeit("solution3()", setup="from __main__ import solution1, solution2, solution3, z, xcond, ycond, xx, yy", number=N)

# 0.439269065857   # solution1
# 0.752536058426   # solution2
# 0.340197086334   # solution3


C2的计算不起作用,因为掩码数组不支持将axis关键字设置为元组。 相反,您可以执行以下操作:

 print "C2: ", z1.mean(axis=2).mean(axis=1) 


顺便说一句,还值得一提的是,如果您对包括平均值在内的全部计算进行计时,则原始的solution2会比solution1快,这可能是因为1)两个掩码数组都比正常的numpy慢; 2)在掩码数组中,您还有更多要看的元素。 当然,由于两个步骤都更快,因此solution3比这两个都快。 也就是说,被遮罩的阵列通常很慢,因此转向它们以提高速度通常被证明是无效的。

 print timeit.timeit("z2.mean(axis=(1, 2))", setup="from __main__ import z1, z2", number=N) print timeit.timeit("z1.mean(axis=2).mean(axis=1)", setup="from __main__ import z1, z2", number=N) 0.134118080139 # z2.mean normal numpy 1.08952116966 # z1.mean masked 


要测试沿不同轴的布尔选择的效率,请将数组设置为正方形,然后尝试每个数组。

 print timeit.timeit("z[:,ycond,:]", setup="from __main__ import solution4, z, xcond, ycond, xx, yy", number=N) print timeit.timeit("z[:,:,xcond]", setup="from __main__ import solution4, z, xcond, ycond, xx, yy", number=N) # running the above with xx=6000, yy=6000 gives # 1.44903206825 # 5.98445320129 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM