简体   繁体   English

Pandas 合并行 / Dataframe 转换

[英]Pandas merging rows / Dataframe Transformation

I have this example DataFrame:我有这个例子 DataFrame:

e   col1    col2    col3
1   238.4   238.7   238.2
2   238.45  238.75  238.2
3   238.2   238.25  237.95
4   238.1   238.15  238.05
5   238.1   238.1   238
6   229.1   229.05  229.05
7   229.35  229.35  229.1
8   229.1   229.15  229
9   229.05  229.05  229

How would I be able to convert it to this:我怎么能把它转换成这个:

                1                      2            3   
    col1    col2    col3    col1    col2    col3    col1    col2    col3
1   238.4   238.7   238.2   238.45  238.75  238.2   238.2   238.25  237.95
2   238.1   238.15  238.05  238.1   238.1   238     229.1   229.05  229.05
3   229.35  229.35  229.1   229.1   229.15  229    229.05   229.05  229

I am thinking maybe I should pivot by counting with lens or assigning a index that could be multiple of 3, but I really am not sure what would be the most efficient way.我在想也许我应该 pivot 通过用镜头计数或分配一个可能是 3 的倍数的索引,但我真的不确定什么是最有效的方法。

Create a grouping series g , this we will be needed to group the dataframe so that every third element (taking a step size of 3) belongs to the same group, use np.unique to get the unique grouping keys, next use DataFrame.groupby to group the dataframe on g and use set_index to set the index of every grouped frame to k , finally use pd.concat to concat all the grouped dataframes along axis=1 and pass the optional parameter keys=k to create MultiLevel columns:创建一个分组系列g ,我们需要对 dataframe 进行分组,以便每隔三个元素(步长为 3)属于同一个组,使用np.unique获取唯一的分组键,接下来使用DataFrame.groupbyg上对 dataframe 进行分组,并使用set_index将每个分组帧的索引设置为k ,最后使用pd.concat沿axis=1连接所有分组数据帧并传递可选参数keys=k以创建MultiLevel列:

g, k = df.pop('e').sub(1) % 3 + 1, np.unique(g)
df1 = pd.concat([g.set_index(k) for _, g in df.groupby(g)], keys=k, axis=1)

Details:细节:

print(g.tolist())
[1, 2, 3, 1, 2, 3, 1, 2, 3]

print(k)
array([1, 2, 3])

Result:结果:

print(df1)

        1                       2                      3                
     col1    col2    col3    col1    col2   col3    col1    col2    col3
1  238.40  238.70  238.20  238.45  238.75  238.2  238.20  238.25  237.95
2  238.10  238.15  238.05  238.10  238.10  238.0  229.10  229.05  229.05
3  229.35  229.35  229.10  229.10  229.15  229.0  229.05  229.05  229.00

The data is shaped in steps of three, as such, we have to iterate through in those steps of 3, and finally concatenate on the columns axis:数据分三步形成,因此,我们必须在三步中迭代,最后在列轴上连接:

pd.concat([df.iloc[n::3].
           reset_index(drop=True).
           set_index(pd.Index([index]*3), 
                     append = True)
           .unstack()
           .swaplevel(1,0, axis=1)
            for n, index in zip(range(0,df.shape[0]//df.shape[1]),
                                range(1, df.shape[1] + 1))], 
          axis = 1)

Using pandas methods and step by step approach:使用 pandas 方法和逐步方法:

df['id1'] = (df.e+2) % 3 + 1
df['id2'] = df['id1']
df.loc[df['id1']>1,'id2']=np.nan
df['id2'] = df['id2'].cumsum().ffill()
df2 = df.drop(columns='e').melt(id_vars = ['id1','id2'])

df3 = pd.pivot_table(df2, index = 'id2', columns = ['id1','variable'], values = 'value').reset_index(drop=True)
df3.index += 1
df3.columns.names = ['',''] 

result:结果:

        1                       2                      3                
     col1    col2    col3    col1    col2   col3    col1    col2    col3
1  238.40  238.70  238.20  238.45  238.75  238.2  238.20  238.25  237.95
2  238.10  238.15  238.05  238.10  238.10  238.0  229.10  229.05  229.05
3  229.35  229.35  229.10  229.10  229.15  229.0  229.05  229.05  229.00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM