簡體   English   中英

如何創建一個包含最后一個元素的子列表,但對相同大小的所有其他子列表使用通用公式?

[英]How do I create a sublist that contains the last element, but uses a general formula for all other sublists of the same size?

我的清單很長,我們稱它為y len(y) = 500 我不是故意在代碼中包含y。

對於y中的每個項目,我想找到該項目的平均值及其5個處理值。 當我到達列表中的最后一項時,我遇到了一個問題,因為我需要在下面的其中一行中使用“ a + 1”。

a = 0
SMAlist = []
for each_item in y:
    if a > 4 and a < ((len(y))-1): # finding my averages begin at 6th item
        b = (y[a-5:a+1]) # this line doesn't work for the last item in y
        SMAsix = round((sum(b)/6),2)
        SMAlist.append(SMAsix)
    if a > ((len(y))-2): # this line seems unnecessary. How can I avoid it?
        b = (y[-6:-1]+[y[a]]) # Should I just use negative values in general?
        SMAsix = round((sum(b)/6),2)
        SMAlist.append(SMAsix)
    a = a+1

可以對列表進行分塊,並在這些分塊上建立平均值。 鏈接的答案使用完整的塊,我對其進行了調整以構建增量塊:

通過列表理解滑動平均值:

# Inspiration for a "full" chunk I adapted: https://stackoverflow.com/a/312464/7505395
def overlappingChunks(l, n):
    """Yield overlapping n-sized chunks from l."""
    for i in range(0, len(l)):
        yield l[i:i + n]

somenums = [10406.19,10995.72,11162.55,11256.7,11634.98,12174.25,13876.47,
            18491.18,16908,15266.43]

# avg over sublist-lengths
slideAvg5 = [ round(sum(part)/(len(part)*1.0),2) for part in overlappingChunks(somenums,6)]

print (slideAvg5)    

輸出:

[11271.73, 11850.11, 13099.36, 14056.93, 14725.22, 15343.27, 16135.52, 
 16888.54, 16087.22, 15266.43]

在對分區取平均之前,我打算按增量range(len(yourlist))分配列表的一部分,但這就是完全分區已在此處解決的問題: 如何將列表分成均勻大小的塊? 我對其進行了調整以產生增量塊,以將其應用於您的問題。


平均使用哪些分區?

explained = {(idx,tuple(part)): round(sum(part)/(len(part)*1.0),2) for idx,part in
             enumerate(overlappingChunks(somenums,6))}
import pprint
pprint.pprint(explained)

輸出(重新格式化):

# Input:
# [10406.19,10995.72,11162.55,11256.7,11634.98,12174.25,13876.47,18491.18,16908,15266.43]

# Index           partinioned part of the input list                         avg 

{(0,     (10406.19, 10995.72, 11162.55, 11256.7, 11634.98, 12174.25))    : 11271.73,
 (1,     (10995.72, 11162.55, 11256.7, 11634.98, 12174.25, 13876.47))    : 11850.11,
 (2,     (11162.55, 11256.7, 11634.98, 12174.25, 13876.47, 18491.18))    : 13099.36,
 (3,     (11256.7, 11634.98, 12174.25, 13876.47, 18491.18, 16908))       : 14056.93,
 (4,     (11634.98, 12174.25, 13876.47, 18491.18, 16908, 15266.43))      : 14725.22,
 (5,     (12174.25, 13876.47, 18491.18, 16908, 15266.43))                : 15343.27,
 (6,     (13876.47, 18491.18, 16908, 15266.43))                          : 16135.52,
 (7,     (18491.18, 16908, 15266.43))                                    : 16888.54,
 (8,     (16908, 15266.43))                                              : 16087.22,
 (9,     (15266.43,))                                                    : 15266.43}

選項1:熊貓

import pandas as pd

y = [10406.19,10995.72,11162.55,11256.7,11634.98,12174.25,13876.47,18491.18,16908,15266.43]
series = pd.Series(y)
print(series.rolling(window=6, center=True).mean().dropna().tolist())

選項2:脾氣暴躁

import numpy as np
window=6
s=np.insert(np.cumsum(np.array(y)), 0, [0])
output = (s[window :] - s[:-window]) * (1. / window)
print(list(output))

產量

[11271.731666666667, 11850.111666666666, 13099.355, 14056.930000000002, 14725.218333333332]

時間(視數據大小而定)

# Pandas
59.5 µs ± 8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# Numpy
19 µs ± 4.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# @PatrickArtner's solution
16.1 µs ± 2.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

更新

檢查計時代碼(在Jupyter筆記本上有效)

%%timeit
import pandas as pd

y = [10406.19,10995.72,11162.55,11256.7,11634.98,12174.25,13876.47,18491.18,16908,15266.43]
series = pd.Series(y)

@Vivek Kalyanarangan的“拉鏈”解決方案有點警告。 對於更長的序列,這很容易失去意義。 為了清楚起見,我們使用float32

>>> y = (1000 + np.sin(np.arange(1000000))).astype(np.float32)
>>> window=6
>>> 
# naive zipper solution
>>> s=np.insert(np.cumsum(np.array(y)), 0, [0])
>>> output = (s[window :] - s[:-window]) * (1. / window)
# towards the end the result is clearly wrong
>>> print(output[-10:])
[1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024.]
>>> 
# this can be alleviated by first taking the difference and then summing
>>> np.cumsum(np.r_[y[:window].sum(), y[window:]-y[:-window]])/window
array([1000.02936,  999.98285,  999.9521 , ..., 1000.0247 , 1000.05304,
       1000.0367 ], dtype=float32)
>>> 
# compare to last value calculated directly for reference
>>> np.mean(y[-6:])
1000.03217

為了進一步減少錯誤,可以在不損失太多速度的情況下,將y塊化並固定每個項。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM