如何根据前一行合并数据框中的行？

Question

I have a sequentially ordered dataframe that represent two events measured over time - the measurements are the start and end times of the event.我有一个按顺序排列的数据帧，它表示随时间测量的两个事件 - 测量值是事件的开始时间和结束时间。 They should be ordered in an ABABAB sequence, but in some cases I may have consecutive events of the same type (ie ABABAABABB).它们应该按 ABABAB 序列排序，但在某些情况下，我可能有相同类型的连续事件（即 ABABAABABB）。 I am looking for a way to check the event label (A or B) in each row with the previous event label, and if they are the same to merge the rows in such a way that I maintain the start time of the first event and the end time of the second event.我正在寻找一种方法来检查每一行中的事件标签（A 或 B）与前一个事件标签，如果它们相同，则以保持第一个事件的开始时间的方式合并行，并且第二个事件的结束时间。 Consider the following:考虑以下：

myDF = pd.DataFrame({"Event": ["A","B","A","A","B","B","A"], 
                 "Start": [1,3,5,7,9,11,13], 
                 "End": [2,4,6,8,10,12,14]})

What I currently have...我目前拥有的...

==============================
  Event      Start      End
==============================
    A          1         2
    B          3         4
    A          5         6
    A          7         8
    B          9         10
    B          11        12
    A          13        14
==============================

What I need...我需要的...

Note: The two A events at index position 2-3 have been merged into one, as have the two B events originally at positions 4-5.注意：索引位置 2-3 处的两个 A 事件已合并为一个，就像最初位于位置 4-5 处的两个 B 事件一样。

==============================
  Event      Start      End
==============================
    A          1         2
    B          3         4
    A          5         8
    B          9         12
    A          13        14
==============================

I had initially thought to use groupby but I don't think this right as this will group over the entire dataframe.我最初想使用groupby但我认为这不正确，因为这将对整个数据帧进行分组。 Similarly I have tried using iteritems but have not had any success.同样，我尝试使用iteritems但没有任何成功。 Apologies for the lack of code but I'm at a loss as to how to approach the problem.为缺少代码道歉，但我不知道如何解决这个问题。

Answer 1

You can use GroupBy.agg with first and last .您可以将GroupBy.agg与first和last 。

g = df["Event"].ne(df["Event"].shift()).cumsum()
df.groupby(g, as_index = False).agg({
  "Event": "first",
  "Start": "first",
  "End": "last"
})

  Event  Start  End
0     A      1    2
1     B      3    4
2     A      5    8
3     B      9   12
4     A     13   14

Answer 2

Another way can be另一种方式可以是

for i in range(1,myDF.shape[0]):
    if myDF['Event'][i] == myDF['Event'][i-1]:
        myDF.loc[i, ('Start')]= min(myDF['Start'][i],myDF['Start'][i-1])
        myDF.loc[i, ('End')]= max(myDF['End'][i],myDF['End'][i-1])
        myDF.drop([i-1],inplace=True)

如何根据前一行合并数据框中的行？

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-10-23 10:05:59

解决方案2
1 2020-10-23 10:13:38

如何根据前一行合并数据框中的行？

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-10-23 10:05:59

解决方案2 1 2020-10-23 10:13:38

解决方案1
1 已采纳 2020-10-23 10:05:59

解决方案2
1 2020-10-23 10:13:38