矢量化的Python代碼，用於迭代和更改窗口中Pandas DataFrame的每一列

Question

我有一個1和0的數據框。 我用循環遍歷每列。 如果迭代得到一個，則應將其保留在該列中。 但是，如果在此位置之后的下n位置中有一些位置，我應該將它們變為零。 然后重復相同的操作，直到列的末尾，然后在每列上重復所有這些操作。

是否有可能擺脫循環並使用pandas / numpy中的dataframe / matrix / array操作向量化所有內容？ 我應該怎么做呢？ n可以在2到100之間。

我嘗試了此函數，但失敗了，只有在它們之間至少有n零時，它才保留一個，這顯然不是我所需要的：

def clear_window(df, n):

    # create buffer of size n
    pad = pd.DataFrame(np.zeros([n, df.shape[1]]),
                       columns=df.columns)
    padded_df = pd.concat([pad, df])

    # compute rolling sum and cut off the buffer
    roll = (padded_df
            .rolling(n+1)
            .sum()
            .iloc[n:, :]
           )

    # delete ones where rolling sum is above 1 or below -1
    result = df * ((roll == 1.0) | (roll == -1.0)).astype(int)

    return result

Answer 1

如果您找不到向量化的方法，Numba將使您更快地解決這些順序循環問題。

這段代碼遍歷每一行以尋找目標值。 找到目標值（1）時，接下來的n行將設置為填充值（0）。 搜索行索引增加，以跳過填充行，並開始下一個搜索。

from numba import jit

@jit(nopython=True)
def find_and_fill(arr, span, tgt_val=1, fill_val=0):
    start_idx = 0
    end_idx = arr.size
    while start_idx < end_idx:
        if arr[start_idx] == tgt_val:
            arr[start_idx + 1 : start_idx + 1 + span] = fill_val
            start_idx = start_idx + 1 + span
        else:
            start_idx = start_idx + 1
    return arr

df2 = df.copy()
# get the dataframe values into a numpy array
a = df2.values

# transpose and run the function for each column of the dataframe
for col in a.T:
    # fill span is set to 6 in this example
    col = find_and_fill(col, 6)

# assign the array back to the dataframe
df2[list(df2.columns)] = a

# df2 now contains the result values

矢量化的Python代碼，用於迭代和更改窗口中Pandas DataFrame的每一列

問題描述

1 個解決方案

解決方案1
0 2018-09-26 22:09:17

矢量化的Python代碼，用於迭代和更改窗口中Pandas DataFrame的每一列

問題描述

1 個解決方案

解決方案1 0 2018-09-26 22:09:17

解決方案1
0 2018-09-26 22:09:17