简体   繁体   English

在 Pandas 数据框中连接多索引信息

[英]concatenating multi-indexed information within a pandas dataframe

I have a multi-indexed dataframe like below:我有一个多索引数据框,如下所示:

       col1 col2 col3 col4
 row1 0    A    A    b    b
      1    B    B    c    c
 row2 0    A    B    d    d
      1    B    B    e    e

and would like to know the most efficient way of concatenating the information eg for row1+col1, row1+col2, etc. such that my result will be:并想知道连接信息的最有效方法,例如 row1+col1、row1+col2 等,这样我的结果将是:

              col1  col2  col3  col4
row1            AB    AB    bc    bc
row2            AB    BB    de    de

so far, the best / only way I can see to do this is :到目前为止,我能看到的最好/唯一的方法是:

dx = pd.concat(
    [df[col].unstack().apply(lambda row: row.str.cat(sep=''),axis=1) 
        for col in df.columns],
    axis=1,
)

dx.columns = df.columns

In practice, this particular dataframe is 1.5m rows by 1000 columns in size, so a more efficient way of iterating through it will be most welcome!在实践中,这个特定的数据帧大小为 1.5m 行 x 1000 列,因此非常欢迎使用更有效的迭代方式!

Strings are sum compatible, so this will simply make it by grouping on the first level of the index:字符串是sum兼容的,因此这将通过在索引的第一级分组来简单地实现:

df.groupby(level=0).apply(sum)
Out[37]: 
     col1 col2 col3 col4
row1   AB   AB   bc   bc
row2   AB   BB   de   de

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM