I want to find the fastest way to compute the average of python list
s. I have millions of list
s stored in a dictionary
, so I am looking for the most efficient way in terms for performance.
Referring to this question , If l
is a list of float numbers, I have
numpy.mean(l)
sum(l) / float(len(l))
reduce(lambda x, y: x + y, l) / len(l)
Which way would be the fastest?
As @DeepSpace has suggested, you should try yourself to answer this question. You might also consider transforming your list into an array before using numpy.mean
. Use %timeit
with ipython
as follows:
In [1]: import random
In [2]: import numpy
In [3]: from functools import reduce
In [4]: l = random.sample(range(0, 100), 50) # generates a random list of 50 elements
numpy.mean
without converting to an np.array In [5]: %timeit numpy.mean(l)
32.5 µs ± 2.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
numpy.mean
converting to an np.array In [5]: a = numpy.array(a)
In [6]: %timeit numpy.mean(a)
17.6 µs ± 205 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sum(l) / float(len(l))
In [5]: %timeit sum(l) / float(len(l)) # not required casting (float) in Python 3
774 ns ± 20.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
sum(l) / len(l)
In [5]: %timeit sum(l) / len(l)
623 ns ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
reduce
In [6]: reduce(lambda x, y: x + y, l) / len(l)
5.92 µs ± 514 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
From slowest to fastest:
numpy.mean(l)
without converting to array numpy.mean(a)
after converting list to np.array
reduce(lambda x, y: x + y, l) / len(l)
sum(l) / float(len(l))
, this applies for Python 2 and 3 sum(l) / len(l)
# For Python 3, you don't need to cast (use float
) Good afternoon, I just did a test with a list of 10 random floats in a list and ran a time test and found numpy to be the fastest.
#!/usr/bin/python
import numpy as np
from functools import reduce
import time
l = [0.1, 2.3, 23.345, 0.9012, .002815, 8.2, 13.9, 0.4, 3.02, 10.1]
def test1():
return np.mean(l)
def test2():
return sum(l) / float(len(l))
def test3():
return reduce(lambda x, y: x + y, l) / len(l)
def timed():
start = time.time()
test1()
print('{} seconds'.format(time.time() - start))
start = time.time()
test2()
print('{} seconds'.format(time.time() - start))
start = time.time()
test3()
print('{} seconds'.format(time.time() - start))
timed()
As always I'm sure there's a better way to do this but this does the trick. This was a small list: it would be interesting to see what you find with large lists.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.