简体   繁体   中英

Iterate over pandas data frame and select n number of rows and columns at a time

So I have a dataset which looks like the following:

# Example
     0  1     2   3  4   5
0   18  1   -19 -16 -5  19
1   18  0   -19 -17 -6  19
2   17  -1  -20 -17 -6  19
3   18  1   -19 -16 -5  20
4   18  0   -19 -16 -5  20

Actual data:

[{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
 {0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
 {0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
 {0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
 {0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
 {0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]

The shape of the above would be: (20, 6) .

What I want to achieve is to apply a custom function to every column on 4 rows at the time.

Example:

  1. First iteration -> f() is applied to df.ix[0:3] for all columns;
  2. Second iteration -> f() is applied to df.ix[4:7] for all columns;

and so on ...

In a way what I need is rolling window of size 4 with stride 4.

The result when using the above data will be a data frame of shape: (5, 6) . Just for the sake of argument, you can assume that the custom function is taking the mean of those 4 rows for each column.

What have I tried so far?

  1. I looked into rolling, but rolling doesn't do what I need it to do. It rolls a window with a stride of 1.
  2. Had a go at actually implementing it, but I really need to optimise that, due to the amount of data:

Here is the code:

curr = 0
res = []
while curr < df_to_look_at2.shape[0]:
    look_at = df_to_look_at2.ix[curr:curr+3]
    curr += 4
    res.append(look_at.mean().values.tolist())
pd.DataFrame(res)

and the result:

       0       1         2       3      4      5
0   17.75   0.25    -19.25  -16.50  -5.50   19.25
1   18.25   0.25    -19.00  -16.00  -5.25   19.50
2   17.75   0.25    -19.25  -16.75  -5.75   19.00
3   17.75   0.25    -19.00  -16.00  -4.75   19.75
4   17.75   0.25    -18.75  -14.75  -3.75   21.00

One extra thought, what if it doesn't only take the mean, but rather min(), max(), mean() and some other custom functions...

Rolling would be accurate here if you wanted to consider a row more than once, in more than one window. However, your windows are unique, so what you are really asking is how to group by your strides, which you can do using arange and floor division.

window_size = 4
grouper = np.arange(df.shape[0]) // window_size

df.groupby(grouper).mean()

       0     1      2      3     4      5
0  17.75  0.25 -19.25 -16.50 -5.50  19.25
1  18.25  0.25 -19.00 -16.00 -5.25  19.50
2  17.75  0.25 -19.25 -16.75 -5.75  19.00
3  17.75  0.25 -19.00 -16.00 -4.75  19.75
4  17.75  0.25 -18.75 -14.75 -3.75  21.00

I think multiple calculations in this manner really belong to numpy turf. You can use a reshape to get the underlying array in the desired format, and just calculate on the array as needed.

inp = [{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
 {0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
 {0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
 {0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
 {0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
 {0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]

import pandas as pd
df = pd.DataFrame(inp)

temp = df.values.reshape(-1, 4, df.shape[-1])

out = pd.DataFrame(temp.mean(axis=1))

Output:

       0     1      2      3     4      5
0  17.75  0.25 -19.25 -16.50 -5.50  19.25
1  18.25  0.25 -19.00 -16.00 -5.25  19.50
2  17.75  0.25 -19.25 -16.75 -5.75  19.00
3  17.75  0.25 -19.00 -16.00 -4.75  19.75
4  17.75  0.25 -18.75 -14.75 -3.75  21.00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM