I have a dataframe of 1.1M rows that I need to process in the following rolling manner:
Suppose a window size of 2
A B
0 "This" 3
1 "is" 4
2 "a" 5
3 "test" 6
The output would be:
A_1 A_2 B_1 B_2
0 "This" "is" 3 4
1 "is" "a" 4 5
2 "a" "test" 5 6
I am currently doing this by iterating through the dataframe, but it would take 3 hours to process such a large dataset.
Is there a more efficient way to do this?
One idea with strides and broadcasting each column separately, because different types:
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
N = 2
a = rolling_window(np.arange(len(df)), N)
print (a)
[[0 1]
[1 2]
[2 3]]
df = pd.concat([pd.DataFrame(df[x].to_numpy()[a]).rename(columns=lambda y: f'{x}_{y + 1}')
for x in df.columns], axis=1)
print (df)
A_1 A_2 B_1 B_2
0 This is 3 4
1 is a 4 5
2 a test 5 6
If need strides with 3
:
N = 3
a = rolling_window(np.arange(len(df)), N)
print (a)
[[0 1 2]
[1 2 3]]
df = pd.concat([pd.DataFrame(df[x].to_numpy()[a]).rename(columns=lambda y: f'{x}_{y + 1}')
for x in df.columns], axis=1)
print (df)
A_1 A_2 A_3 B_1 B_2 B_3
0 This is a 3 4 5
1 is a test 4 5 6
Performance in numpy solutions is good:
#8 columns, 400krows
df = pd.concat([df] * 4, ignore_index=True, axis=1)
df.columns = list('ABCDEFGH')
df = pd.concat([df] * 100000, ignore_index=True)
In [53]: %%timeit
...: a = rolling_window(np.arange(len(df)), 2)
...: pd.concat([pd.DataFrame(df[x].to_numpy()[a]).rename(columns=lambda y: f'{x}_{y + 1}') for x in df.columns], axis=1)
...:
...:
167 ms ± 741 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [54]: %%timeit
...: window = 2
...: pd.DataFrame({ f'{col}_{i}': list(df[col][i-1:len(df)-window+i]) for col in df.columns for i in range(1,window+1) })
...:
...:
1.52 s ± 2.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Java supports the feature of anonymous array, so you don't need to declare the array while passing an array to the method.
CODE
//@GIOCHE
public class TestAnonymousArray{
/ creating a method which receives
array as a parameter /
static void printArray(int arr[])
{
for(int i = 0; i < arr.length; i++)
System.Out.Println(arr[i]);
}
public static void main(String args[]) {
//Passing anonymous array to method
printArray(new int[] {10,22,44,66});
OUTPUT 10 22 44 66
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.