简体   繁体   中英

Rolling PCA on pandas dataframe

I'm wondering if anyone knows of how to implement a rolling/moving window PCA on a pandas dataframe. I've looked around and found implementations in R and MATLAB but not Python. Any help would be appreciated!

This is not a duplicate - moving window PCA is not the same as PCA on the entire dataframe. Please see pandas.DataFrame.rolling() if you do not understand the difference

Unfortunately, pandas.DataFrame.rolling() seems to flatten the df before rolling, so it cannot be used as one might expect to roll over the rows of the df and pass windows of rows to the PCA.

The following is a work-around for this based on rolling over indices instead of rows. It may not be very elegant but it works:

# Generate some data (1000 time points, 10 features)
data = np.random.random(size=(1000,10))
df = pd.DataFrame(data)

# Set the window size
window = 100

# Initialize an empty df of appropriate size for the output
df_pca = pd.DataFrame( np.zeros((data.shape[0] - window + 1, data.shape[1])) )

# Define PCA fit-transform function
# Note: Instead of attempting to return the result, 
#       it is written into the previously created output array.
def rolling_pca(window_data):
    pca = PCA()
    transf = pca.fit_transform(df.iloc[window_data])
    df_pca.iloc[int(window_data[0])] = transf[0,:]
    return True

# Create a df containing row indices for the workaround
df_idx = pd.DataFrame(np.arange(df.shape[0]))

# Use `rolling` to apply the PCA function
_ = df_idx.rolling(window).apply(rolling_pca)

# The results are now contained here:
print df_pca

A quick check reveals that the values produced by this are identical to control values computed by slicing appropriate windows manually and running PCA on them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM