[英]Method to avoid this nested for-loop
此代碼是否需要嵌套的 for 循環,或者是否有更有效的解決方法?
這是一個簡化版本,它在由 1 到 1000 的 20 個隨機整數組成的數據集中搜索連續的重疊間隔。它運行 1-100 的誤差值,通過從 20 個隨機整數中添加/減去它們來創建間隔。
例子:
輸入假設數據幀的大小為 10 而不是 20:
df = [433, 3, 4, 5, 6, 7, 378, 87, 0, 500]
for 循環中 error = 1 的輸出:
重疊 = {0:[[1, 2, 3, 4, 5]]}
def find_overlap(df, error):
"""
df: dataframe with random 20 integers from 1-1000
error: used to create the interval by +/- to each value in the dataframe
returns: list of list of indexes overlapping
"""
# add the interval to the dataframe as columns of minimum and maximum
df["min"] = df["x"] - error
df["max"] = df["x"] + error
# overlaps stores lists of indexes that overlap
overlaps = []
# fill in data for start
temporary = [0]
minimum = df["min"].iloc[0]
maximum = df["min"].iloc[0]
# iterates through the dataframe checking for overlap between successive intervals
for index , row in df.iterrows():
current_min = row["min"]
current_max = row["max"]
# yes overlap
if (current_min <= maximum) and (current_max >= minimum):
temporary.append(index)
if current_min > minimum:
minimum = current_min
if current_max < maximum:
maximum = current_max
continue
# no overlap - also check for 5 successive overlaps
if len(temporary) >= 5:
overlaps.append(temporary)
temporary = [index]
minimum = current_min
maximum = current_max
return overlaps
# creates dataframe with 20 random integers from 1 to 1000
df = pd.DataFrame(np.random.randint(1, 1000, 20), columns=["x"])
overlaps = {}
for error in range(0,100):
lst = find_overlap(df, error)
if len(lst):
overlaps[error] = lst
print(overlaps)
所以,從我從你的代碼中了解到的......你正在尋找:
x
所有值之間的差值。error
,其中error
從范圍[0, 100)
假設我的解釋是正確的......你實際上可以將其矢量化並避免 for 循環,就像你的直覺讓你相信一樣。 最終,如果我的解釋不正確,這至少應該為您創建所需代碼的矢量化版本提供一個不錯的開始。 🙂
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1, 1000, 20), columns=["x"])
overlaps = {}
for margin in range(0, 100):
diffs = np.abs(df["x"].values - np.roll(df["x"], margin))
# np.convolve is analogous to a sliding window sum
quint = np.convolve(diffs == margin, np.ones(5), "valid")
index = np.nonzero(quint == 5)[0]
if index.size > 0:
overlaps[margin] = [list(range(i, i + 5)) for i in index]
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1, 1000, 20), columns=["x"])
overlaps = {}
for margin in range(0, 100):
diffs = np.abs(df["x"].values - np.roll(df["x"], margin))
index = np.nonzero(diff == margin)[0]
if idx.size > 0:
overlaps[margin] = idx
如果您不熟悉numpy
, .size
會為您提供ndarray
的總大小。 (所以形狀為(10, 20, 30)
的 3D 數組的大小為6000
。)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.