简体   繁体   English

Pandas 数据框合并行

[英]Pandas data frame combine rows

My problem is a large data frame which I would like to clear out.我的问题是一个大数据框,我想清除它。 The two main problems for me are:对我来说,两个主要问题是:

  1. The whole data frame is time-based.整个数据帧是基于时间的。 That means I can not shift rows around, otherwise, the timestamp wouldn't fit anymore.这意味着我不能移动行,否则时间戳将不再适合。

  2. The data is not always in the same order.数据并不总是以相同的顺序排列。

Here is an example to clarify这是一个澄清的例子

index  a  b  c  d  x1  x2  y1  y2  t
0                  1   2           0.2
1      1  2                        0.4
2                          2   4   0.6
3                  1   2           1.8
4                          2   3   2.0
5                  1   2           3.8
6                          2   3   4.0
7            2  5                  4.2

The result should be looking like this结果应该是这样的

index  a  b  c  d  x1  x2  y1  y2  t
0                  1   2   2   4   0.2
1      1  2                        0.4
3                  1   2   2   3   1.8
5                  1   2   2   3   3.8
7            2  5                  4.2

This means I would like, to sum up, the right half of the df and keep the timestamp of the first entry.这意味着我想总结一下 df 的右半部分并保留第一个条目的时间戳。 The second problem is, there might be different data from the left half of the df in between.第二个问题是,中间可能有来自 df 左半部分的不同数据。

This may not be the most general solution, but it solves your problem:这可能不是最通用的解决方案,但它可以解决您的问题:

First , isolate the right half:首先,隔离右半部分:

r = df[['x1', 'x2', 'y1', 'y2']].dropna(how='all')

Second , use dropna applied column by column to compress the data:其次,使用dropna逐列应用来压缩数据:

r_compressed = r.apply(
    lambda g: g.dropna().reset_index(drop=True),
    axis=0
).set_index(r.index[::2])

You need to drop the index otherwise pandas will attempt to realign the data.您需要删除索引,否则熊猫会尝试重新对齐数据。 The original index is reapplied at the end (but only with every second index label) to facilitate reinsertion of the left half and the t column.原始索引在最后重新应用(但仅每隔一个索引标签)以方便重新插入左半部分和t列。

Output (note the index values):输出(注意索引值):

    x1   x2   y1   y2
0  1.0  2.0  2.0  4.0
3  1.0  2.0  2.0  3.0
5  1.0  2.0  2.0  3.0

Third , isolate left half:第三,隔离左半部分:

l = df[['a', 'b', 'c', 'd']].dropna(how='all')

Fourth , incorporate the left half and t column to compressed right half:第四,将左半部分和t列合并到压缩的右半部分:

out = r_compressed.combine_first(l)
out['t'] = df['t']

Output:输出:

     a    b    c    d   x1   x2   y1   y2    t
0  NaN  NaN  NaN  NaN  1.0  2.0  2.0  4.0  0.2
1  1.0  2.0  NaN  NaN  NaN  NaN  NaN  NaN  0.4
3  NaN  NaN  NaN  NaN  1.0  2.0  2.0  3.0  1.8
5  NaN  NaN  NaN  NaN  1.0  2.0  2.0  3.0  3.8
7  NaN  NaN  2.0  5.0  NaN  NaN  NaN  NaN  4.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM