[英]Why python numpy is slowly?
example, for gr = np.array([5, 4, 3, 5, 2])
and genx = np.array(["femy_gen_m", "my_gen_m", "my_gen_m", "femy_gen_m", "my_gen_m"])
, the output is {'my_gen_m': 3.0, 'femy_gen_m': 5.0}
.例如,对于
gr = np.array([5, 4, 3, 5, 2])
和genx = np.array(["femy_gen_m", "my_gen_m", "my_gen_m", "femy_gen_m", "my_gen_m"])
,输出为{'my_gen_m': 3.0, 'femy_gen_m': 5.0}
。 Hint.暗示。 use mean from
numpy
.使用
numpy
平均值。
I write the function for already written unittest by teacher, but face with slow function processing.我为老师已经写好的单元测试编写了函数,但面临函数处理缓慢的问题。
Attached my code below.下面附上我的代码。
from timeit import timeit
import numpy as np
#mycode
def mean_by_redneg(gr, genx):
result = {}
my_gen_m_sum, femy_gen_m_sum = [], []
for index, element in enumerate(genx):
if element == 'my_gen_m':
my_gen_m_sum.append(gr[index])
if element == 'femy_gen_m':
femy_gen_m_sum.append(gr[index])
result['my_gen_m'] = np.asarray(my_gen_m_sum).mean()
result['femy_gen_m'] = np.asarray(femy_gen_m_sum).mean()
return result
#check the function
def test(gr, genx, outp):
ret = mean_by_redneg(np.array(gr), np.array(genx))
assert np.isclose(ret['femy_gen_m'], outp['femy_gen_m'])
assert np.isclose(ret['my_gen_m'], outp['my_gen_m'])
test([5, 4, 3, 5, 2], ["femy_gen_m", "my_gen_m", "my_gen_m", "femy_gen_m", "my_gen_m"], {'my_gen_m': 3.0, 'femy_gen_m': 5.0})
test([1, 0] * 10, ['femy_gen_m', 'my_gen_m'] * 10, {'femy_gen_m': 1, 'my_gen_m': 0})
test(range(100), ['femy_gen_m', 'my_gen_m'] * 50, {'femy_gen_m': 49.0, 'my_gen_m': 50.0})
test(list(range(100)) + [100], ['my_gen_m'] * 100 + ['femy_gen_m'], {'my_gen_m': 49.5, 'femy_gen_m': 100.0})
def bm_test(a, b):
xx = 0
yy = 0
im = 0
fi = 0
for x, y in zip(a, b):
if x != y:
xx += x
yy += x
im += 1
fi += 1
return xx + yy
N = int(1E5)
gr = np.array([1.1] * N + [2.2] * N)
genx = np.array(['my_gen_m'] * N + ['femy_gen_m'] * N)
bm = timeit("assert np.isclose(mean_by_redneg(gr, genx)['my_gen_m'], 1.1)",
"from __main__ import np, mean_by_redneg, gr, genx",
number=1)
reference_bm = timeit("bm_test(gr, genx)",
"from __main__ import bm_test, gr, genx",
number=1)
assert reference_bm > bm * 10, "too slow"
Do you have any idea how to do that work faster?你知道如何更快地完成这项工作吗? ps Thank you for your time
ps谢谢你的时间
The vectorized way to do this in numpy
is much simpler than your loopy code.在
numpy
执行此操作的矢量化方法比您的循环代码简单得多。 The heart of it would be something like:它的核心是这样的:
out = {}
for gen in ['Male', 'Female']:
out[gen] = grades[genders == gen].mean()
How this works:这是如何工作的:
genders == gen
resolves to an array of True
and False
called a 'boolean index'. genders == gen
解析为一个True
和False
数组,称为“布尔索引”。 when grades
is indexed by it, it returns the values of grades
that correspond to the locations in the index that are True
.当
grades
被它索引时,它返回与索引中为True
的位置对应的grades
值。 So when gen
is 'Male'
, grades[genders == gen]
corresponds to the grades corresponding to 'Male'
s.因此,当
gen
为'Male'
, grades[genders == gen]
对应于对应于'Male'
的成绩。 Once you've resolved that array, you use its .mean()
method to calculate the mean, and assign it to the dictionary.一旦你解析了这个数组,你就可以使用它的
.mean()
方法来计算平均值,并将它分配给字典。
This is significantly faster since the iteration/indexing part is completed in the compiled c
code that is the backend of numpy
, instead of interpreted python
code.这明显更快,因为迭代/索引部分是在作为
numpy
后端的编译的c
代码中完成的,而不是解释的python
代码。
Use the following function:使用以下函数:
def mean_by_gender2(grades, genders):
return { g: grades[genders == g].mean() for g in np.unique(genders) }
Comparison of execution times (using %timeit ) shows:执行时间的比较(使用%timeit )显示:
Admittedly, for a very short test data (5 items in each array), your solution is faster (yours: 36 µs and mine: 52.9 µs).诚然,对于非常短的测试数据(每个数组中有 5 个项目),您的解决方案更快(您的:36 µs,我的:52.9 µs)。
But if you take a longer test data (100 items in each array), then my solution is better (yours: 99.5 µs and mine: 62.6 µs).但是,如果您采用更长的测试数据(每个数组中有 100 个项目),那么我的解决方案会更好(您的:99.5 µs,我的:62.6 µs)。
For yet longer source data the advantage of my solution should be more apparent.对于更长的源数据,我的解决方案的优势应该更加明显。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.