import numpy as numpy
from collections import namedtuple
from random import random
Smoker = namedtuple("Smoker", ["Female","Male"])
Nonsmoker = namedtuple("Nonsmoker", ["Female","Male"])
LST = [(Smoker(random(),random()),Nonsmoker(random(),random())) for i in range(100)]
So I have a long list whose elements are tuples. Each tuple contains a pair of namedtuples. What is the fastest way to find the average of this list? Ideally the result is still of the same structure, that is, (Smoker(Female=w,Male=x),Nonsmoker(Female=y,Male=z))
..
grizzly = Smoker(np.mean([a.Female for a,b in LST]),np.mean([a.Male for a,b in LST]))
panda = Nonmoker(np.mean([b.Female for a,b in LST]),np.mean([b.Male for a,b in LST]))
result = (grizzly, panda)
np.mean
has to convert the list to an array, which takes time. Python sum
saves time:
In [6]: %%timeit
...: grizzly = Smoker(np.mean([a.Female for a,b in LST]),np.mean([a.Male for
...: a,b in LST]))
...: panda = Nonsmoker(np.mean([b.Female for a,b in LST]),np.mean([b.Male for
...: a,b in LST]))
...: result = (grizzly, panda)
...:
...:
158 µs ± 597 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [9]: %%timeit
...: n=len(LST)
...: grizzly = Smoker(sum([a.Female for a,b in LST])/n,sum([a.Male for a,b in
...: LST])/n)
...: panda = Nonsmoker(sum([b.Female for a,b in LST])/n,sum([b.Male for a,b i
...: n LST])/n)
...: result = (grizzly, panda)
...:
...:
46.2 µs ± 37.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Both produce the same result
(to within a small epsilon):
In [8]: result
Out[8]:
(Smoker(Female=0.5383695316982974, Male=0.5493854404111675),
Nonsmoker(Female=0.4913454565011218, Male=0.47143788469638825))
If you could collect the values in one array, possibly (n,4) shape, then the mean will be fast. For one time calculation it probably isn't worth it -
In [11]: M = np.array([(a.Female, a.Male, b.Female, b.Male) for a,b in LST])
In [12]: np.mean(M, axis=0)
Out[12]: array([0.53836953, 0.54938544, 0.49134546, 0.47143788])
In [13]: timeit M = np.array([(a.Female, a.Male, b.Female, b.Male) for a,b in LST])
128 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [14]: timeit np.mean(M, axis=0)
21.9 µs ± 371 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Since named tuples can be accessed like regular tuples, we can make an array directly from LST
:
In [16]: np.array(LST).shape
Out[16]: (100, 2, 2)
In [17]: np.array(LST).mean(axis=0)
Out[17]:
array([[0.53836953, 0.54938544],
[0.49134546, 0.47143788]])
But timing isn't encouraging:
In [18]: timeit np.array(LST).mean(axis=0)
1.26 ms ± 7.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I can also make a structured array from your list - with nested dtypes:
In [26]: dt = np.dtype([('Smoker', [('Female','f'),('Male','f')]),('Nonsmoker',[
...: ('Female','f'),('Male','f')])])
In [27]: M=np.array(LST,dt)
In [28]: M['Smoker']['Female'].mean()
Out[28]: 0.53836954
Curiously timing is relatively good:
In [29]: timeit M=np.array(LST,dt)
40.6 µs ± 243 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
But I have to take each mean separately, or else convert it to an unstructured array first.
I could make a (n,4) float array from the structured one with a view
or a recfunctions
utility:
In [53]: M1 = M.view([('f0','f',(4,))])['f0']
In [54]: M1.shape
Out[54]: (100, 4)
In [55]: M2=rf.structured_to_unstructured(M)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.