避免這種嵌套 for 循環的方法

Question

此代碼是否需要嵌套的 for 循環，或者是否有更有效的解決方法？

這是一個簡化版本，它在由 1 到 1000 的 20 個隨機整數組成的數據集中搜索連續的重疊間隔。它運行 1-100 的誤差值，通過從 20 個隨機整數中添加/減去它們來創建間隔。

例子：

輸入假設數據幀的大小為 10 而不是 20：

df = [433, 3, 4, 5, 6, 7, 378, 87, 0, 500]

for 循環中 error = 1 的輸出：

重疊 = {0:[[1, 2, 3, 4, 5]]}

def find_overlap(df, error):
    """
    df: dataframe with random 20 integers from 1-1000
    error: used to create the interval by +/- to each value in the dataframe
    returns: list of list of indexes overlapping
    """

    # add the interval to the dataframe as columns of minimum and maximum
    df["min"] = df["x"] - error
    df["max"] = df["x"] + error

    # overlaps stores lists of indexes that overlap
    overlaps = []

    # fill in data for start
    temporary = [0]
    minimum = df["min"].iloc[0]
    maximum = df["min"].iloc[0]

    # iterates through the dataframe checking for overlap between successive intervals
    for index , row in df.iterrows():
        current_min = row["min"]
        current_max = row["max"]

        # yes overlap
        if (current_min <= maximum) and (current_max >= minimum):
            temporary.append(index)
            if current_min > minimum:
                minimum = current_min
            if current_max < maximum:
                maximum = current_max
            continue

        # no overlap - also check for 5 successive overlaps
        if len(temporary) >= 5:
            overlaps.append(temporary)
        temporary = [index]
        minimum = current_min
        maximum = current_max

    return overlaps



# creates dataframe with 20 random integers from 1 to 1000
df = pd.DataFrame(np.random.randint(1, 1000, 20), columns=["x"])

overlaps = {}
for error in range(0,100):
    lst = find_overlap(df, error)
    if len(lst):
        overlaps[error] = lst

print(overlaps)

Answer 1

所以，從我從你的代碼中了解到的......你正在尋找：

計算x所有值之間的差值。
確定它是否小於error ，其中error從范圍[0, 100)
選擇所有大小為 5 的子數組。

假設我的解釋是正確的......你實際上可以將其矢量化並避免 for 循環，就像你的直覺讓你相信一樣。 最終，如果我的解釋不正確，這至少應該為您創建所需代碼的矢量化版本提供一個不錯的開始。 🙂

更新的解決方案（考慮 5 元組）

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(1, 1000, 20), columns=["x"])

overlaps = {}

for margin in range(0, 100):
    diffs = np.abs(df["x"].values - np.roll(df["x"], margin))
    # np.convolve is analogous to a sliding window sum
    quint = np.convolve(diffs == margin, np.ones(5), "valid")
    index = np.nonzero(quint == 5)[0]
    if index.size > 0:
        overlaps[margin] = [list(range(i, i + 5)) for i in index]

原始解決方案（不考慮 5 元組）

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(1, 1000, 20), columns=["x"])

overlaps = {}

for margin in range(0, 100):
    diffs = np.abs(df["x"].values - np.roll(df["x"], margin))
    index = np.nonzero(diff == margin)[0]
    if idx.size > 0:
        overlaps[margin] = idx

如果您不熟悉numpy ， .size會為您提供ndarray的總大小。 （所以形狀為(10, 20, 30)的 3D 數組的大小為6000 。）

避免這種嵌套 for 循環的方法

問題描述

1 個解決方案

解決方案1
1 已采納 2021-07-22 03:36:14

更新的解決方案（考慮 5 元組）

原始解決方案（不考慮 5 元組）

避免這種嵌套 for 循環的方法

問題描述

1 個解決方案

解決方案1 1 已采納 2021-07-22 03:36:14

更新的解決方案（考慮 5 元組）

原始解決方案（不考慮 5 元組）

解決方案1
1 已采納 2021-07-22 03:36:14