简体   繁体   中英

concatenating multi-indexed information within a pandas dataframe

I have a multi-indexed dataframe like below:

       col1 col2 col3 col4
 row1 0    A    A    b    b
      1    B    B    c    c
 row2 0    A    B    d    d
      1    B    B    e    e

and would like to know the most efficient way of concatenating the information eg for row1+col1, row1+col2, etc. such that my result will be:

              col1  col2  col3  col4
row1            AB    AB    bc    bc
row2            AB    BB    de    de

so far, the best / only way I can see to do this is :

dx = pd.concat(
    [df[col].unstack().apply(lambda row: row.str.cat(sep=''),axis=1) 
        for col in df.columns],
    axis=1,
)

dx.columns = df.columns

In practice, this particular dataframe is 1.5m rows by 1000 columns in size, so a more efficient way of iterating through it will be most welcome!

Strings are sum compatible, so this will simply make it by grouping on the first level of the index:

df.groupby(level=0).apply(sum)
Out[37]: 
     col1 col2 col3 col4
row1   AB   AB   bc   bc
row2   AB   BB   de   de

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM