python pandas滾動窗口並重新創建數據框

Question

我有一個看起來像這樣的DataFrame

df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B'],
                  'attritube1' : [0,1,1,1,0,2,9], 
                  'attritube2':[1,2,3,3,0,0,1]})
print(df)

     attritube1  attritube2 user
0           0           1    A
1           1           2    A
2           1           3    A
3           1           3    B
4           0           0    B
5           2           0    B
6           9           1    B

我想為每個用戶使用長度為 K 的滾動窗口切片數據並創建一個新的數據集。 例如，如果 K = 2，那么我想得到

   attritube1  attritube2 user
0           0           1    A
1           1           2    A
---------------------------------
2           1           2    A
3           1           3    A
---------------------------------
4           1           3    B
5           0           0    B
---------------------------------
6           0           0    B
7           2           0    B
--------------------------------
8           2           0    B
9           9           1    B

同樣，如果 K = 3，那么新的數據框應該是

    attritube1  attritube2 user
0           0           1    A
1           1           2    A
2           1           3    A
--------------------------------
3           1           3    B
4           0           0    B
5           2           0    B
--------------------------------
6           0           0    B
7           2           0    B
8           9           1    B

我們可以假設對於所有用戶，行數 >= K。謝謝！

編輯：想澄清一下，我想為每個用戶重復滾動窗口過程（玩具示例中的 A、B）。

Answer 1

嘗試：

k=3
pd.concat([df.groupby('user').apply(lambda x: pd.concat([x.iloc[i: i + k] for i in range(len(x.index) - k + 1)]))])


        attribute1  attribute2 user
user                               
A    0           0           1    A
     1           1           2    A
     2           1           3    A
B    3           1           3    B
     4           0           0    B
     5           2           0    B
     4           0           0    B
     5           2           0    B
     6           9           1    B

Answer 2

 df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B','A', 'A', 'A', 'B', 'B', 'C','B','A', 'C', 'C', 'B', 'B', 'B','B'],
              'attritube1' : [0,1,1,1,0,2,9,0,1,1,1,0,2,9,0,1,1,1,0,2,9], 
              'attritube2':[1,2,3,3,0,0,1,0,1,1,1,0,2,9,0,1,1,1,0,2,9]})


 # creating Multi Index Data Frame
 m_df=df.set_index(df["user"],append=True)
 m_df=m_df.swaplevel(0,1,axis=0)


 k=2


 final_df=pd.concat([m_df.loc[item].iloc[:k] for item in sorted(set(df["user"]))])
 final_df.index=range(final_df.shape[0])  # to resort the index 


print final_df

這個答案使用了多索引數據框並逐步完成，這（至少對我而言）更容易閱讀。

python pandas滾動窗口並重新創建數據框

問題描述

2 個解決方案

解決方案1
2 已采納 2016-03-30 16:40:21

解決方案2
0 2016-03-30 16:37:07

python pandas滾動窗口並重新創建數據框

問題描述

2 個解決方案

解決方案1 2 已采納 2016-03-30 16:40:21

解決方案2 0 2016-03-30 16:37:07

解決方案1
2 已采納 2016-03-30 16:40:21

解決方案2
0 2016-03-30 16:37:07