Suppose we have a Dataframe looks like this
df = pd.DataFrame(columns=['A', 'B','C'])
df.loc[0]=[1,2,3]
df.loc[1]=[4,5,6]
df.loc[2]=[7,8,9]
df.loc[3]=[10,11,12]
df.loc[4]=[13,14,15]
df.loc[5]=[16,17,18]
df.loc[6]=[19,20,21]
df
A B C
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
4 13 14 15
5 16 17 18
6 19 20 21
I want to modify df to get df2;
df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])
df2.loc[0]=[1,2,4,5,7,8]
df2.loc[1]=[4,5,7,8,10,11]
df2.loc[2]=[7,8,10,11,13,14]
df2.loc[3]=[10,11,13,14,16,17]
df2.loc[4]=[13,14,16,17,19,20]
df2
first second third fourth fifth sixth
0 1 2 4 5 7 8
1 4 5 7 8 10 11
2 7 8 10 11 13 14
3 10 11 13 14 16 17
4 13 14 16 17 19 20
That is, I want to fill the first row of df2 by the three rows of the first two columns of df. Then we proceed to fill the second row of df2 by the next three rows of the two columns of df, and etc.
What should I do to move from df to df2? I can do some elementary and simple manipulation. But it looks still hard for me now.
Can anyone help me, please?
You can use strides with convert first 2 columns to 1d array by ravel and also select each pair row by indexing [::2]
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = rolling_window(df[['A','B']].to_numpy().ravel(), 6)[::2]
print (a)
[[1 2 4 5 7 8]
[4 5 7 8 10 11]
[7 8 10 11 13 14]
[10 11 13 14 16 17]
[13 14 16 17 19 20]]
df2 = pd.DataFrame(a, columns=['first', 'second','third','fourth','fifth','sixth'])
print (df2)
first second third fourth fifth sixth
0 1 2 4 5 7 8
1 4 5 7 8 10 11
2 7 8 10 11 13 14
3 10 11 13 14 16 17
4 13 14 16 17 19 20
Use NumPy as:
import numpy as np
new = df.values[:, :2].reshape(-1)
l = [new[2*i:2*i+6] for i in range(int(new.shape[0]/2-2))]
l = np.array(l)
df2 = pd.DataFrame(l, columns=['first', 'second','third','fourth','fifth','sixth'])
print(df2)
'''
Output:
first second third fourth fifth sixth
0 1 2 4 5 7 8
1 4 5 7 8 10 11
2 7 8 10 11 13 14
3 10 11 13 14 16 17
4 13 14 16 17 19 20
'''
A simpler solution could be dropping the column "C". And simply joining 3 lists to make a row for df2.
Code goes like this:
df.drop(['C'] ,axis = 1 , inplace = True)
df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])
for i in range(0,len(df.A) - 2):
df2.loc[i] = list(df.loc[i]) + list(df.loc[i+1]) + list(df.loc[i+2])
print(df2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.