[英]To find out a header and index of a particular value in a multi-columned dataframe
[英]pandas : from a two columns dataframe to a (time series) multi-columned dataFrame
假设我们有一个 Dataframe 看起来像这样
df = pd.DataFrame(columns=['A', 'B','C'])
df.loc[0]=[1,2,3]
df.loc[1]=[4,5,6]
df.loc[2]=[7,8,9]
df.loc[3]=[10,11,12]
df.loc[4]=[13,14,15]
df.loc[5]=[16,17,18]
df.loc[6]=[19,20,21]
df
A B C
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
4 13 14 15
5 16 17 18
6 19 20 21
我想修改 df 得到 df2;
df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])
df2.loc[0]=[1,2,4,5,7,8]
df2.loc[1]=[4,5,7,8,10,11]
df2.loc[2]=[7,8,10,11,13,14]
df2.loc[3]=[10,11,13,14,16,17]
df2.loc[4]=[13,14,16,17,19,20]
df2
first second third fourth fifth sixth
0 1 2 4 5 7 8
1 4 5 7 8 10 11
2 7 8 10 11 13 14
3 10 11 13 14 16 17
4 13 14 16 17 19 20
也就是我想用df的前两列的三行来填充df2的第一行。 然后我们继续用 df 的两列的接下来的三行填充 df2 的第二行,依此类推。
我应该怎么做才能从 df 移动到 df2? 我可以做一些基本和简单的操作。 但现在对我来说仍然很难。
任何人都可以帮助我吗?
您可以通过 ravel 和 select 每对行通过索引[::2]
使用strides将前 2 列转换为 1d 数组
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = rolling_window(df[['A','B']].to_numpy().ravel(), 6)[::2]
print (a)
[[1 2 4 5 7 8]
[4 5 7 8 10 11]
[7 8 10 11 13 14]
[10 11 13 14 16 17]
[13 14 16 17 19 20]]
df2 = pd.DataFrame(a, columns=['first', 'second','third','fourth','fifth','sixth'])
print (df2)
first second third fourth fifth sixth
0 1 2 4 5 7 8
1 4 5 7 8 10 11
2 7 8 10 11 13 14
3 10 11 13 14 16 17
4 13 14 16 17 19 20
使用 NumPy 作为:
import numpy as np
new = df.values[:, :2].reshape(-1)
l = [new[2*i:2*i+6] for i in range(int(new.shape[0]/2-2))]
l = np.array(l)
df2 = pd.DataFrame(l, columns=['first', 'second','third','fourth','fifth','sixth'])
print(df2)
'''
Output:
first second third fourth fifth sixth
0 1 2 4 5 7 8
1 4 5 7 8 10 11
2 7 8 10 11 13 14
3 10 11 13 14 16 17
4 13 14 16 17 19 20
'''
一个更简单的解决方案可能是删除列“C”。 只需加入 3 个列表即可为 df2 排成一行。
代码如下:
df.drop(['C'] ,axis = 1 , inplace = True)
df2 = pd.DataFrame(columns=['first', 'second','third','fourth','fifth','sixth'])
for i in range(0,len(df.A) - 2):
df2.loc[i] = list(df.loc[i]) + list(df.loc[i+1]) + list(df.loc[i+2])
print(df2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.