So I have a dataset which looks like the following:
# Example
0 1 2 3 4 5
0 18 1 -19 -16 -5 19
1 18 0 -19 -17 -6 19
2 17 -1 -20 -17 -6 19
3 18 1 -19 -16 -5 20
4 18 0 -19 -16 -5 20
Actual data:
[{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
{0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
{0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
{0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
{0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
{0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]
The shape of the above would be: (20, 6)
.
What I want to achieve is to apply a custom function to every column on 4 rows at the time.
Example:
f()
is applied to df.ix[0:3]
for all columns; f()
is applied to df.ix[4:7]
for all columns; and so on ...
In a way what I need is rolling window of size 4 with stride 4.
The result when using the above data will be a data frame of shape: (5, 6)
. Just for the sake of argument, you can assume that the custom function is taking the mean of those 4 rows for each column.
What have I tried so far?
Here is the code:
curr = 0
res = []
while curr < df_to_look_at2.shape[0]:
look_at = df_to_look_at2.ix[curr:curr+3]
curr += 4
res.append(look_at.mean().values.tolist())
pd.DataFrame(res)
and the result:
0 1 2 3 4 5
0 17.75 0.25 -19.25 -16.50 -5.50 19.25
1 18.25 0.25 -19.00 -16.00 -5.25 19.50
2 17.75 0.25 -19.25 -16.75 -5.75 19.00
3 17.75 0.25 -19.00 -16.00 -4.75 19.75
4 17.75 0.25 -18.75 -14.75 -3.75 21.00
One extra thought, what if it doesn't only take the mean, but rather min(), max(), mean() and some other custom functions...
Rolling would be accurate here if you wanted to consider a row more than once, in more than one window. However, your windows are unique, so what you are really asking is how to group by your strides, which you can do using arange
and floor division.
window_size = 4
grouper = np.arange(df.shape[0]) // window_size
df.groupby(grouper).mean()
0 1 2 3 4 5
0 17.75 0.25 -19.25 -16.50 -5.50 19.25
1 18.25 0.25 -19.00 -16.00 -5.25 19.50
2 17.75 0.25 -19.25 -16.75 -5.75 19.00
3 17.75 0.25 -19.00 -16.00 -4.75 19.75
4 17.75 0.25 -18.75 -14.75 -3.75 21.00
I think multiple calculations in this manner really belong to numpy turf. You can use a reshape to get the underlying array in the desired format, and just calculate on the array as needed.
inp = [{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
{0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
{0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
{0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
{0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
{0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]
import pandas as pd
df = pd.DataFrame(inp)
temp = df.values.reshape(-1, 4, df.shape[-1])
out = pd.DataFrame(temp.mean(axis=1))
Output:
0 1 2 3 4 5
0 17.75 0.25 -19.25 -16.50 -5.50 19.25
1 18.25 0.25 -19.00 -16.00 -5.25 19.50
2 17.75 0.25 -19.25 -16.75 -5.75 19.00
3 17.75 0.25 -19.00 -16.00 -4.75 19.75
4 17.75 0.25 -18.75 -14.75 -3.75 21.00
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.