pandas : from a two columns dataframe to a (time series) multi-columned dataFrame

Question

Suppose we have a Dataframe looks like this

df = pd.DataFrame(columns=['A', 'B','C'])
df.loc[0]=[1,2,3]
df.loc[1]=[4,5,6]
df.loc[2]=[7,8,9]
df.loc[3]=[10,11,12]
df.loc[4]=[13,14,15]
df.loc[5]=[16,17,18]
df.loc[6]=[19,20,21]
df


    A   B   C

0   1   2   3

1   4   5   6

2   7   8   9

3   10  11  12

4   13  14  15

5   16  17  18

6   19  20  21

I want to modify df to get df2;

df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])
df2.loc[0]=[1,2,4,5,7,8]
df2.loc[1]=[4,5,7,8,10,11]
df2.loc[2]=[7,8,10,11,13,14]
df2.loc[3]=[10,11,13,14,16,17]
df2.loc[4]=[13,14,16,17,19,20]
df2

    first   second  third   fourth  fifth   sixth

0   1   2   4   5   7   8

1   4   5   7   8   10  11

2   7   8   10  11  13  14

3   10  11  13  14  16  17

4   13  14  16  17  19  20

That is, I want to fill the first row of df2 by the three rows of the first two columns of df. Then we proceed to fill the second row of df2 by the next three rows of the two columns of df, and etc.

What should I do to move from df to df2? I can do some elementary and simple manipulation. But it looks still hard for me now.

Can anyone help me, please?

Answer 1

You can use strides with convert first 2 columns to 1d array by ravel and also select each pair row by indexing [::2]

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = rolling_window(df[['A','B']].to_numpy().ravel(), 6)[::2]
print (a)
[[1 2 4 5 7 8]
 [4 5 7 8 10 11]
 [7 8 10 11 13 14]
 [10 11 13 14 16 17]
 [13 14 16 17 19 20]]

df2 = pd.DataFrame(a, columns=['first', 'second','third','fourth','fifth','sixth'])
print (df2)
  first second third fourth fifth sixth
0     1      2     4      5     7     8
1     4      5     7      8    10    11
2     7      8    10     11    13    14
3    10     11    13     14    16    17
4    13     14    16     17    19    20

Answer 2

Use NumPy as:

import numpy as np
new = df.values[:, :2].reshape(-1)
l = [new[2*i:2*i+6] for i in range(int(new.shape[0]/2-2))]
l = np.array(l)
df2 = pd.DataFrame(l, columns=['first', 'second','third','fourth','fifth','sixth'])
print(df2)

'''
Output:
  first second third fourth fifth sixth
0     1      2     4      5     7     8
1     4      5     7      8    10    11
2     7      8    10     11    13    14
3    10     11    13     14    16    17
4    13     14    16     17    19    20
'''

Answer 3

A simpler solution could be dropping the column "C". And simply joining 3 lists to make a row for df2.

Code goes like this:

df.drop(['C'] ,axis = 1 , inplace = True)

df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])

for i in range(0,len(df.A) - 2):
    df2.loc[i] = list(df.loc[i]) + list(df.loc[i+1]) + list(df.loc[i+2])

print(df2)

pandas : from a two columns dataframe to a (time series) multi-columned dataFrame

Question

3 answers

solution1
1 2020-08-14 08:38:24

solution2
1 2020-08-14 08:41:59

solution3
1 ACCPTED 2020-08-14 09:20:26

pandas : from a two columns dataframe to a (time series) multi-columned dataFrame

Question

3 answers

solution1 1 2020-08-14 08:38:24

solution2 1 2020-08-14 08:41:59

solution3 1 ACCPTED 2020-08-14 09:20:26

solution1
1 2020-08-14 08:38:24

solution2
1 2020-08-14 08:41:59

solution3
1 ACCPTED 2020-08-14 09:20:26