简体   繁体   English

pandas:从两列 dataframe 到(时间序列)多列 dataFrame

[英]pandas : from a two columns dataframe to a (time series) multi-columned dataFrame

Suppose we have a Dataframe looks like this假设我们有一个 Dataframe 看起来像这样

df = pd.DataFrame(columns=['A', 'B','C'])
df.loc[0]=[1,2,3]
df.loc[1]=[4,5,6]
df.loc[2]=[7,8,9]
df.loc[3]=[10,11,12]
df.loc[4]=[13,14,15]
df.loc[5]=[16,17,18]
df.loc[6]=[19,20,21]
df


    A   B   C

0   1   2   3

1   4   5   6

2   7   8   9

3   10  11  12

4   13  14  15

5   16  17  18

6   19  20  21

I want to modify df to get df2;我想修改 df 得到 df2;

df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])
df2.loc[0]=[1,2,4,5,7,8]
df2.loc[1]=[4,5,7,8,10,11]
df2.loc[2]=[7,8,10,11,13,14]
df2.loc[3]=[10,11,13,14,16,17]
df2.loc[4]=[13,14,16,17,19,20]
df2

    first   second  third   fourth  fifth   sixth

0   1   2   4   5   7   8

1   4   5   7   8   10  11

2   7   8   10  11  13  14

3   10  11  13  14  16  17

4   13  14  16  17  19  20

That is, I want to fill the first row of df2 by the three rows of the first two columns of df.也就是我想用df的前两列的三行来填充df2的第一行。 Then we proceed to fill the second row of df2 by the next three rows of the two columns of df, and etc.然后我们继续用 df 的两列的接下来的三行填充 df2 的第二行,依此类推。

What should I do to move from df to df2?我应该怎么做才能从 df 移动到 df2? I can do some elementary and simple manipulation.我可以做一些基本和简单的操作。 But it looks still hard for me now.但现在对我来说仍然很难。

Can anyone help me, please?任何人都可以帮助我吗?

You can use strides with convert first 2 columns to 1d array by ravel and also select each pair row by indexing [::2]您可以通过 ravel 和 select 每对行通过索引[::2]使用strides将前 2 列转换为 1d 数组

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = rolling_window(df[['A','B']].to_numpy().ravel(), 6)[::2]
print (a)
[[1 2 4 5 7 8]
 [4 5 7 8 10 11]
 [7 8 10 11 13 14]
 [10 11 13 14 16 17]
 [13 14 16 17 19 20]]

df2 = pd.DataFrame(a, columns=['first', 'second','third','fourth','fifth','sixth'])
print (df2)
  first second third fourth fifth sixth
0     1      2     4      5     7     8
1     4      5     7      8    10    11
2     7      8    10     11    13    14
3    10     11    13     14    16    17
4    13     14    16     17    19    20

Use NumPy as:使用 NumPy 作为:

import numpy as np
new = df.values[:, :2].reshape(-1)
l = [new[2*i:2*i+6] for i in range(int(new.shape[0]/2-2))]
l = np.array(l)
df2 = pd.DataFrame(l, columns=['first', 'second','third','fourth','fifth','sixth'])
print(df2)

'''
Output:
  first second third fourth fifth sixth
0     1      2     4      5     7     8
1     4      5     7      8    10    11
2     7      8    10     11    13    14
3    10     11    13     14    16    17
4    13     14    16     17    19    20
'''

A simpler solution could be dropping the column "C".一个更简单的解决方案可能是删除列“C”。 And simply joining 3 lists to make a row for df2.只需加入 3 个列表即可为 df2 排成一行。

Code goes like this:代码如下:

df.drop(['C'] ,axis = 1 , inplace = True)

df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])

for i in range(0,len(df.A) - 2):
    df2.loc[i] = list(df.loc[i]) + list(df.loc[i+1]) + list(df.loc[i+2])

print(df2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM