如何創建一個包含最后一個元素的子列表，但對相同大小的所有其他子列表使用通用公式？

Question

我的清單很長，我們稱它為y 。 len(y) = 500 。 我不是故意在代碼中包含y。

對於y中的每個項目，我想找到該項目的平均值及其5個處理值。 當我到達列表中的最后一項時，我遇到了一個問題，因為我需要在下面的其中一行中使用“ a + 1”。

a = 0
SMAlist = []
for each_item in y:
    if a > 4 and a < ((len(y))-1): # finding my averages begin at 6th item
        b = (y[a-5:a+1]) # this line doesn't work for the last item in y
        SMAsix = round((sum(b)/6),2)
        SMAlist.append(SMAsix)
    if a > ((len(y))-2): # this line seems unnecessary. How can I avoid it?
        b = (y[-6:-1]+[y[a]]) # Should I just use negative values in general?
        SMAsix = round((sum(b)/6),2)
        SMAlist.append(SMAsix)
    a = a+1

Answer 1

您可以對列表進行分塊，並在這些分塊上建立平均值。 鏈接的答案使用完整的塊，我對其進行了調整以構建增量塊：

通過列表理解滑動平均值：

# Inspiration for a "full" chunk I adapted: https://stackoverflow.com/a/312464/7505395
def overlappingChunks(l, n):
    """Yield overlapping n-sized chunks from l."""
    for i in range(0, len(l)):
        yield l[i:i + n]

somenums = [10406.19,10995.72,11162.55,11256.7,11634.98,12174.25,13876.47,
            18491.18,16908,15266.43]

# avg over sublist-lengths
slideAvg5 = [ round(sum(part)/(len(part)*1.0),2) for part in overlappingChunks(somenums,6)]

print (slideAvg5)

輸出：

[11271.73, 11850.11, 13099.36, 14056.93, 14725.22, 15343.27, 16135.52, 
 16888.54, 16087.22, 15266.43]

在對分區取平均之前，我打算按增量range(len(yourlist))分配列表的一部分，但這就是完全分區已在此處解決的問題：如何將列表分成均勻大小的塊？ 我對其進行了調整以產生增量塊，以將其應用於您的問題。

平均使用哪些分區？

explained = {(idx,tuple(part)): round(sum(part)/(len(part)*1.0),2) for idx,part in
             enumerate(overlappingChunks(somenums,6))}
import pprint
pprint.pprint(explained)

輸出（重新格式化）：

# Input:
# [10406.19,10995.72,11162.55,11256.7,11634.98,12174.25,13876.47,18491.18,16908,15266.43]

# Index           partinioned part of the input list                         avg 

{(0,     (10406.19, 10995.72, 11162.55, 11256.7, 11634.98, 12174.25))    : 11271.73,
 (1,     (10995.72, 11162.55, 11256.7, 11634.98, 12174.25, 13876.47))    : 11850.11,
 (2,     (11162.55, 11256.7, 11634.98, 12174.25, 13876.47, 18491.18))    : 13099.36,
 (3,     (11256.7, 11634.98, 12174.25, 13876.47, 18491.18, 16908))       : 14056.93,
 (4,     (11634.98, 12174.25, 13876.47, 18491.18, 16908, 15266.43))      : 14725.22,
 (5,     (12174.25, 13876.47, 18491.18, 16908, 15266.43))                : 15343.27,
 (6,     (13876.47, 18491.18, 16908, 15266.43))                          : 16135.52,
 (7,     (18491.18, 16908, 15266.43))                                    : 16888.54,
 (8,     (16908, 15266.43))                                              : 16087.22,
 (9,     (15266.43,))                                                    : 15266.43}

Answer 2

選項1：熊貓

import pandas as pd

y = [10406.19,10995.72,11162.55,11256.7,11634.98,12174.25,13876.47,18491.18,16908,15266.43]
series = pd.Series(y)
print(series.rolling(window=6, center=True).mean().dropna().tolist())

選項2：脾氣暴躁

import numpy as np
window=6
s=np.insert(np.cumsum(np.array(y)), 0, [0])
output = (s[window :] - s[:-window]) * (1. / window)
print(list(output))

產量

[11271.731666666667, 11850.111666666666, 13099.355, 14056.930000000002, 14725.218333333332]

時間（視數據大小而定）

# Pandas
59.5 µs ± 8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# Numpy
19 µs ± 4.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# @PatrickArtner's solution
16.1 µs ± 2.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

更新

檢查計時代碼（在Jupyter筆記本上有效）

%%timeit
import pandas as pd

y = [10406.19,10995.72,11162.55,11256.7,11634.98,12174.25,13876.47,18491.18,16908,15266.43]
series = pd.Series(y)

Answer 3

@Vivek Kalyanarangan的“拉鏈”解決方案有點警告。 對於更長的序列，這很容易失去意義。 為了清楚起見，我們使用float32 ：

>>> y = (1000 + np.sin(np.arange(1000000))).astype(np.float32)
>>> window=6
>>> 
# naive zipper solution
>>> s=np.insert(np.cumsum(np.array(y)), 0, [0])
>>> output = (s[window :] - s[:-window]) * (1. / window)
# towards the end the result is clearly wrong
>>> print(output[-10:])
[1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024.]
>>> 
# this can be alleviated by first taking the difference and then summing
>>> np.cumsum(np.r_[y[:window].sum(), y[window:]-y[:-window]])/window
array([1000.02936,  999.98285,  999.9521 , ..., 1000.0247 , 1000.05304,
       1000.0367 ], dtype=float32)
>>> 
# compare to last value calculated directly for reference
>>> np.mean(y[-6:])
1000.03217

為了進一步減少錯誤，可以在不損失太多速度的情況下，將y塊化並固定每個項。

如何創建一個包含最后一個元素的子列表，但對相同大小的所有其他子列表使用通用公式？

問題描述

3 個解決方案

解決方案1
2 2018-02-20 07:18:54

解決方案2
2 2018-02-20 07:27:11

解決方案3
2 2018-02-20 08:19:25

如何創建一個包含最后一個元素的子列表，但對相同大小的所有其他子列表使用通用公式？

問題描述

3 個解決方案

解決方案1 2 2018-02-20 07:18:54

解決方案2 2 2018-02-20 07:27:11

解決方案3 2 2018-02-20 08:19:25

解決方案1
2 2018-02-20 07:18:54

解決方案2
2 2018-02-20 07:27:11

解決方案3
2 2018-02-20 08:19:25