[英]concatenating multi-indexed information within a pandas dataframe
I have a multi-indexed dataframe like below:我有一个多索引数据框,如下所示:
col1 col2 col3 col4
row1 0 A A b b
1 B B c c
row2 0 A B d d
1 B B e e
and would like to know the most efficient way of concatenating the information eg for row1+col1, row1+col2, etc. such that my result will be:并想知道连接信息的最有效方法,例如 row1+col1、row1+col2 等,这样我的结果将是:
col1 col2 col3 col4
row1 AB AB bc bc
row2 AB BB de de
so far, the best / only way I can see to do this is :到目前为止,我能看到的最好/唯一的方法是:
dx = pd.concat(
[df[col].unstack().apply(lambda row: row.str.cat(sep=''),axis=1)
for col in df.columns],
axis=1,
)
dx.columns = df.columns
In practice, this particular dataframe is 1.5m rows by 1000 columns in size, so a more efficient way of iterating through it will be most welcome!在实践中,这个特定的数据帧大小为 1.5m 行 x 1000 列,因此非常欢迎使用更有效的迭代方式!
Strings are sum
compatible, so this will simply make it by grouping on the first level of the index:字符串是
sum
兼容的,因此这将通过在索引的第一级分组来简单地实现:
df.groupby(level=0).apply(sum)
Out[37]:
col1 col2 col3 col4
row1 AB AB bc bc
row2 AB BB de de
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.