如何迭代 Pandas 数据框中的多列？

Question

I have the following dataframe where one column denotes the ID (0,1) of a speaker for each second of a conversation, and the other column denotes the seconds passed in that conversation.我有以下数据框，其中一列表示对话中每一秒的说话者 ID (0,1)，另一列表示该对话中经过的秒数。

myDF = pd.DataFrame({'ID': [0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1], 'seconds': (np.arange(16))})

-------------------------------
   ID               Seconds
-------------------------------
   0                   0
   0                   1
   0                   2
   0                   3
   1                   4
   1                   5
   1                   6
   1                   7
   0                   8
   0                   9
   0                   10
   0                   11
   1                   12
   1                   13
   1                   14
   1                   15
-------------------------------

I am only interested in speaker ID 1, where hopefully it is clear that the boundaries of speaker 1 speech is between seconds 4-7, and between 12-15.我只对演讲者 ID 1 感兴趣，希望演讲者 1 演讲的边界在 4-7 秒和 12-15 秒之间很明显。 What I want to generate is a separate dataframe that contains the start and end of each speaker 1 speech segment, where each row is an uninterrupted period of speech.我想要生成的是一个单独的数据帧，其中包含每个说话者 1 语音段的开始和结束，其中每一行是一段不间断的语音。 Something like this:像这样的东西：

--------------------------------
  start              end       
--------------------------------
    4                 7        
    12                15
--------------------------------

I have some non-working pseudo-code that hopefully outlines what I am trying to achieve, but as yet I cannot find the right solution.我有一些非工作的伪代码，希望能概述我想要实现的目标，但到目前为止我还找不到正确的解决方案。 In essence, for each row I am comparing the ID value with the previous row (because a change in ID denotes the start of speech) and adding the corresponding seconds value to the bdry dataframe.实质上，对于每一行，我将 ID 值与前一行进行比较（因为 ID 的变化表示语音的开始）并将相应的秒值添加到 bdry 数据帧。 Similarly, I am then comparing each ID value to the next row (as this will denote the end of speech).同样，然后我将每个 ID 值与下一行进行比较（因为这将表示语音结束）。

bdry = pd.DataFrame(columns=['start','end'])

for i in myDF:
    if i['ID'] == 1:
        if i.ID != i['ID'].shift(): # compare ID with previous
            bdry['start'].append(i['seconds'])
        if i.ID != i['ID'].shift(-1): # compare ID with next
            bdry['end'].append(i['seconds'])

Answer 1

import pandas as pd
from itertools import groupby


myDF = pd.DataFrame({'ID': [0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1], 'seconds': (np.arange(16))})

tmp, m = [], myDF['ID'] == 1
for v, g in groupby(zip(m.index, m), lambda k: k[1]):
    if v:
        g = list(g)
        tmp.append((g[0][0], g[-1][0]))

df = pd.DataFrame(tmp, columns=['start', 'end'])
print(df)

Prints:印刷：

   start  end
0      4    7
1     12   15

Answer 2

Use the following code:使用以下代码：

result = myDF.groupby((myDF.ID != myDF.ID.shift()).cumsum()).agg(
    ID=('ID', 'first'), start=('seconds', 'first'), end=('seconds', 'last'))\
    .query('ID == 1').drop(columns='ID').reset_index(drop=True)

For your data sample the result is:对于您的数据样本，结果是：

   start  end
0      4    7
1     12   15

如何迭代 Pandas 数据框中的多列？

问题描述

2 个解决方案

解决方案1
1 2020-10-06 20:07:12

解决方案2
1 已采纳 2020-10-06 20:38:02

如何迭代 Pandas 数据框中的多列？

问题描述

2 个解决方案

解决方案1 1 2020-10-06 20:07:12

解决方案2 1 已采纳 2020-10-06 20:38:02

解决方案1
1 2020-10-06 20:07:12

解决方案2
1 已采纳 2020-10-06 20:38:02