简体   繁体   English

矢量化的Python代码,用于迭代和更改窗口中Pandas DataFrame的每一列

[英]Vectorized Python code for iterating and changing each column of a Pandas DataFrame within a window

I have a dataframe of ones and zeros. 我有一个1和0的数据框。 I iterate over each column with a loop. 我用循环遍历每列。 If I get a one at an iteration, I should keep it in the column. 如果迭代得到一个,则应将其保留在该列中。 But if in the next n positions after this one there are some ones, I should turn them into zeros. 但是,如果在此位置之后的下n位置中有一些位置,我应该将它们变为零。 Then repeat the same up to the end of the column, and then repeat all this on each column. 然后重复相同的操作,直到列的末尾,然后在每列上重复所有这些操作。

Is it possible to get rid of the loop and vectorize everything with dataframe/matrix/array operations in pandas/numpy? 是否有可能摆脱循环并使用pandas / numpy中的dataframe / matrix / array操作向量化所有内容? And how should I go about it? 我应该怎么做呢? n could be anywhere from 2 to 100. n可以在2到100之间。

I tried this function, but failed, it only keeps ones if there are at least n zeros between them which is obviously not what I need: 我尝试了此函数,但失败了,只有在它们之间至少有n零时,它才保留一个,这显然不是我所需要的:

def clear_window(df, n):

    # create buffer of size n
    pad = pd.DataFrame(np.zeros([n, df.shape[1]]),
                       columns=df.columns)
    padded_df = pd.concat([pad, df])

    # compute rolling sum and cut off the buffer
    roll = (padded_df
            .rolling(n+1)
            .sum()
            .iloc[n:, :]
           )

    # delete ones where rolling sum is above 1 or below -1
    result = df * ((roll == 1.0) | (roll == -1.0)).astype(int)

    return result

Numba will get you speed with these sequential looping problems if you can't find a way to vectorize. 如果您找不到向量化的方法,Numba将使您更快地解决这些顺序循环问题。

This code loops through every row looking for a target value. 这段代码遍历每一行以寻找目标值。 When a target value (1) is found, the next n rows are set to the fill value (0). 找到目标值(1)时,接下来的n行将设置为填充值(0)。 The search row index is incremented to skip over the fill rows and the next search is begun. 搜索行索引增加,以跳过填充行,并开始下一个搜索。

from numba import jit

@jit(nopython=True)
def find_and_fill(arr, span, tgt_val=1, fill_val=0):
    start_idx = 0
    end_idx = arr.size
    while start_idx < end_idx:
        if arr[start_idx] == tgt_val:
            arr[start_idx + 1 : start_idx + 1 + span] = fill_val
            start_idx = start_idx + 1 + span
        else:
            start_idx = start_idx + 1
    return arr

df2 = df.copy()
# get the dataframe values into a numpy array
a = df2.values

# transpose and run the function for each column of the dataframe
for col in a.T:
    # fill span is set to 6 in this example
    col = find_and_fill(col, 6)

# assign the array back to the dataframe
df2[list(df2.columns)] = a

# df2 now contains the result values

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 矢量与熊猫中DataFrame的每一列的系列相关性 - Correlation of a Series to each column of a DataFrame in Pandas, vectorized 向量化DataFrame Python熊猫? - Vectorized DataFrame Python-pandas? 通过拆分每列并迭代 python pandas DataFrame 中的许多列来插入新列 - inserting new columns by splitting each column and iterating for many columns in python pandas DataFrame 迭代 pandas dataframe 并改变值 - Iterating pandas dataframe and changing values Pandas 数据框列上带有计数器的矢量化函数 - Vectorized function with counter on pandas dataframe column Pandas/Python 中的矢量化回测器:循环遍历每只股票作为一个新的数据帧,还是将它们全部放在一个数据帧中? - Vectorized Backtester in Pandas/Python: Loop through each stock as a new dataframe or put it all in one dataframe? 遍历 pandas dataframe 的每一行后获取特定列 - Get specific column after iterating over each row of pandas dataframe 遍历每一行时提高代码效率:Pandas Dataframe - Improve code efficiency when iterating through each row: Pandas Dataframe 遍历 dataframe 列并在 python pandas 中创建新列 - Iterating over dataframe column and creating new column in python pandas python pandas:矢量化时间序列窗口函数 - python pandas: vectorized time series window function
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM