簡體   English   中英

在 Python 中找到元組列表的平均值的最快方法是什么,每個元組包含一對命名元組?

[英]What is the fastest way to find the average for a list of tuples in Python, each tuple containing a pair of namedtuples?

import numpy as numpy
from collections import namedtuple
from random import random

Smoker    = namedtuple("Smoker", ["Female","Male"])
Nonsmoker = namedtuple("Nonsmoker", ["Female","Male"])

LST = [(Smoker(random(),random()),Nonsmoker(random(),random())) for i in range(100)]

所以我有一個很長的列表,其元素是元組。 每個元組包含一對命名元組。 找到此列表的平均值的最快方法是什么? 理想情況下,結果仍然是相同的結構,即(Smoker(Female=w,Male=x),Nonsmoker(Female=y,Male=z)) ..

grizzly = Smoker(np.mean([a.Female for a,b in LST]),np.mean([a.Male for a,b in LST]))
panda = Nonmoker(np.mean([b.Female for a,b in LST]),np.mean([b.Male for a,b in LST]))
result = (grizzly, panda)

np.mean必須將列表轉換為數組,這需要時間。 Python sum節省時間:

In [6]: %%timeit
   ...: grizzly = Smoker(np.mean([a.Female for a,b in LST]),np.mean([a.Male for
   ...: a,b in LST]))
   ...: panda = Nonsmoker(np.mean([b.Female for a,b in LST]),np.mean([b.Male for
   ...:  a,b in LST]))
   ...: result = (grizzly, panda)
   ...: 
   ...: 
158 µs ± 597 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %%timeit
   ...: n=len(LST)
   ...: grizzly = Smoker(sum([a.Female for a,b in LST])/n,sum([a.Male for a,b in
   ...:  LST])/n)
   ...: panda = Nonsmoker(sum([b.Female for a,b in LST])/n,sum([b.Male for a,b i
   ...: n LST])/n)
   ...: result = (grizzly, panda)
   ...: 
   ...: 
46.2 µs ± 37.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

兩者都產生相同的result (在一個小的 epsilon 內):

In [8]: result
Out[8]: 
(Smoker(Female=0.5383695316982974, Male=0.5493854404111675),
 Nonsmoker(Female=0.4913454565011218, Male=0.47143788469638825))

如果您可以在一個數組中收集值,可能是 (n,4) 形狀,那么平均值將很快。 一次計算它可能不值得 -

In [11]: M = np.array([(a.Female, a.Male, b.Female, b.Male) for a,b in LST])
In [12]: np.mean(M, axis=0)
Out[12]: array([0.53836953, 0.54938544, 0.49134546, 0.47143788])

In [13]: timeit M = np.array([(a.Female, a.Male, b.Female, b.Male) for a,b in LST])
128 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [14]: timeit np.mean(M, axis=0)
21.9 µs ± 371 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

由於命名元組可以像常規元組一樣訪問,我們可以直接從LST創建一個數組:

In [16]: np.array(LST).shape
Out[16]: (100, 2, 2)
In [17]: np.array(LST).mean(axis=0)
Out[17]: 
array([[0.53836953, 0.54938544],
       [0.49134546, 0.47143788]])

但時機並不令人鼓舞:

In [18]: timeit np.array(LST).mean(axis=0)
1.26 ms ± 7.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

我還可以從您的列表中創建一個結構化數組 - 使用嵌套的 dtypes:

In [26]: dt = np.dtype([('Smoker', [('Female','f'),('Male','f')]),('Nonsmoker',[
    ...: ('Female','f'),('Male','f')])])
In [27]: M=np.array(LST,dt)
In [28]: M['Smoker']['Female'].mean()
Out[28]: 0.53836954

奇怪的是時機相對較好:

In [29]: timeit M=np.array(LST,dt)
40.6 µs ± 243 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

但是我必須分別取每個平均值,否則先將其轉換為非結構化數組。

我可以使用viewrecfunctions實用程序從結構化數組中創建一個 (n,4) 浮點數組:

In [53]: M1 = M.view([('f0','f',(4,))])['f0']
In [54]: M1.shape
Out[54]: (100, 4)
In [55]: M2=rf.structured_to_unstructured(M)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM