[英]Pandas merging rows / Dataframe Transformation
I have this example DataFrame:我有这个例子 DataFrame:
e col1 col2 col3
1 238.4 238.7 238.2
2 238.45 238.75 238.2
3 238.2 238.25 237.95
4 238.1 238.15 238.05
5 238.1 238.1 238
6 229.1 229.05 229.05
7 229.35 229.35 229.1
8 229.1 229.15 229
9 229.05 229.05 229
How would I be able to convert it to this:我怎么能把它转换成这个:
1 2 3
col1 col2 col3 col1 col2 col3 col1 col2 col3
1 238.4 238.7 238.2 238.45 238.75 238.2 238.2 238.25 237.95
2 238.1 238.15 238.05 238.1 238.1 238 229.1 229.05 229.05
3 229.35 229.35 229.1 229.1 229.15 229 229.05 229.05 229
I am thinking maybe I should pivot by counting with lens or assigning a index that could be multiple of 3, but I really am not sure what would be the most efficient way.我在想也许我应该 pivot 通过用镜头计数或分配一个可能是 3 的倍数的索引,但我真的不确定什么是最有效的方法。
Create a grouping series g
, this we will be needed to group the dataframe so that every third element (taking a step size of 3) belongs to the same group, use np.unique
to get the unique grouping keys, next use DataFrame.groupby
to group the dataframe on g
and use set_index
to set the index of every grouped frame to k
, finally use pd.concat
to concat all the grouped dataframes along axis=1
and pass the optional parameter keys=k
to create MultiLevel
columns:创建一个分组系列
g
,我们需要对 dataframe 进行分组,以便每隔三个元素(步长为 3)属于同一个组,使用np.unique
获取唯一的分组键,接下来使用DataFrame.groupby
在g
上对 dataframe 进行分组,并使用set_index
将每个分组帧的索引设置为k
,最后使用pd.concat
沿axis=1
连接所有分组数据帧并传递可选参数keys=k
以创建MultiLevel
列:
g, k = df.pop('e').sub(1) % 3 + 1, np.unique(g)
df1 = pd.concat([g.set_index(k) for _, g in df.groupby(g)], keys=k, axis=1)
Details:细节:
print(g.tolist())
[1, 2, 3, 1, 2, 3, 1, 2, 3]
print(k)
array([1, 2, 3])
Result:结果:
print(df1)
1 2 3
col1 col2 col3 col1 col2 col3 col1 col2 col3
1 238.40 238.70 238.20 238.45 238.75 238.2 238.20 238.25 237.95
2 238.10 238.15 238.05 238.10 238.10 238.0 229.10 229.05 229.05
3 229.35 229.35 229.10 229.10 229.15 229.0 229.05 229.05 229.00
The data is shaped in steps of three, as such, we have to iterate through in those steps of 3, and finally concatenate on the columns axis:数据分三步形成,因此,我们必须在三步中迭代,最后在列轴上连接:
pd.concat([df.iloc[n::3].
reset_index(drop=True).
set_index(pd.Index([index]*3),
append = True)
.unstack()
.swaplevel(1,0, axis=1)
for n, index in zip(range(0,df.shape[0]//df.shape[1]),
range(1, df.shape[1] + 1))],
axis = 1)
Using pandas methods and step by step approach:使用 pandas 方法和逐步方法:
df['id1'] = (df.e+2) % 3 + 1
df['id2'] = df['id1']
df.loc[df['id1']>1,'id2']=np.nan
df['id2'] = df['id2'].cumsum().ffill()
df2 = df.drop(columns='e').melt(id_vars = ['id1','id2'])
df3 = pd.pivot_table(df2, index = 'id2', columns = ['id1','variable'], values = 'value').reset_index(drop=True)
df3.index += 1
df3.columns.names = ['','']
result:结果:
1 2 3
col1 col2 col3 col1 col2 col3 col1 col2 col3
1 238.40 238.70 238.20 238.45 238.75 238.2 238.20 238.25 237.95
2 238.10 238.15 238.05 238.10 238.10 238.0 229.10 229.05 229.05
3 229.35 229.35 229.10 229.10 229.15 229.0 229.05 229.05 229.00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.