[英]Iterate over pandas data frame and select n number of rows and columns at a time
因此,我有一個數據集,如下所示:
# Example
0 1 2 3 4 5
0 18 1 -19 -16 -5 19
1 18 0 -19 -17 -6 19
2 17 -1 -20 -17 -6 19
3 18 1 -19 -16 -5 20
4 18 0 -19 -16 -5 20
實際數據:
[{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
{0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
{0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
{0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
{0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
{0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]
上面的形狀是: (20, 6)
。
我想要實現的是將自定義函數同時應用於4行上的每一列。
例:
f()
應用於所有列的df.ix[0:3]
; f()
應用於所有列的df.ix[4:7]
; 等等 ...
在某種程度上,我需要以4步長滾動4號窗口。
使用上述數據時,結果將是形狀為(5, 6)
的數據框。 僅出於論證的目的,您可以假定自定義函數將每列的那4行作為平均值。
到目前為止,我嘗試了什么?
這是代碼:
curr = 0
res = []
while curr < df_to_look_at2.shape[0]:
look_at = df_to_look_at2.ix[curr:curr+3]
curr += 4
res.append(look_at.mean().values.tolist())
pd.DataFrame(res)
結果:
0 1 2 3 4 5
0 17.75 0.25 -19.25 -16.50 -5.50 19.25
1 18.25 0.25 -19.00 -16.00 -5.25 19.50
2 17.75 0.25 -19.25 -16.75 -5.75 19.00
3 17.75 0.25 -19.00 -16.00 -4.75 19.75
4 17.75 0.25 -18.75 -14.75 -3.75 21.00
還有一個想法,如果它不僅要取均值,還要取min(),max(),mean()和其他一些自定義函數呢?
如果您想在一個以上的窗口中考慮多個行,則滾動在此處是准確的。 然而,你的窗戶都是獨一無二的,所以你真正問的是如何通過你的進步,你可以用做組arange
和地板師。
window_size = 4
grouper = np.arange(df.shape[0]) // window_size
df.groupby(grouper).mean()
0 1 2 3 4 5
0 17.75 0.25 -19.25 -16.50 -5.50 19.25
1 18.25 0.25 -19.00 -16.00 -5.25 19.50
2 17.75 0.25 -19.25 -16.75 -5.75 19.00
3 17.75 0.25 -19.00 -16.00 -4.75 19.75
4 17.75 0.25 -18.75 -14.75 -3.75 21.00
我認為以這種方式進行的多次計算實際上屬於numpy草皮。 您可以使用整形來獲得所需格式的基礎數組,然后根據需要在數組上進行計算。
inp = [{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
{0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
{0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
{0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
{0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
{0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]
import pandas as pd
df = pd.DataFrame(inp)
temp = df.values.reshape(-1, 4, df.shape[-1])
out = pd.DataFrame(temp.mean(axis=1))
輸出:
0 1 2 3 4 5
0 17.75 0.25 -19.25 -16.50 -5.50 19.25
1 18.25 0.25 -19.00 -16.00 -5.25 19.50
2 17.75 0.25 -19.25 -16.75 -5.75 19.00
3 17.75 0.25 -19.00 -16.00 -4.75 19.75
4 17.75 0.25 -18.75 -14.75 -3.75 21.00
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.