简体   繁体   English

使用根据另一列的值更改的组 ID 创建新列

[英]Create new column with a group ID that changes based on the value of another column

I have a dataframe with a bunch of Q&A sessions.我有一个包含一堆问答会话的数据框。 Each time the speaker changes, the dataframe has a new row.每次说话者改变时,数据帧都会有一个新行。 I'm trying to assign question characteristics to the answers so I want to create an ID for each question-answer group.我正在尝试为答案分配问题特征,因此我想为每个问答组创建一个 ID。 In the example below, I want to increment the id each time a new question is asked ( speakertype_id == 3 => questions; speakertype_id == 4 => answers).在下面的示例中,我想在每次提出新问题时增加 id( speakertype_id == 3 => questions; speakertype_id == 4 => answers)。 I currently loop through the dataframe like so:我目前循环遍历数据框,如下所示:

Q_A = pd.DataFrame({'qna_id':[9]*10,
                    'qnacomponentid':[3,4,5,6,7,8,9,10,11,12],
                    'speakertype_id':[3,4,3,4,4,4,3,4,3,4]})


group = [0]*len(Q_A)
j = 1
for index,row in enumerate(Q_A.itertuples()):
    if row[3] == 3: 
        j+=1
    group[index] = j

Q_A['group'] = group

This gives me the desired output and is much faster than I expected, but this post makes me question whether I should ever iterate over a pandas dataframe.这为我提供了所需的输出并且比我预期的要快得多,但是这篇文章让我怀疑我是否应该遍历 Pandas 数据帧。 Any thoughts on a better method?关于更好的方法的任何想法? Thanks.谢谢。

**Edit: Expected Output: **编辑:预期输出:

qna_id  qnacomponentid  speakertype_id  group
9   3   3   2
9   4   4   2
9   5   3   3
9   6   4   3
9   7   4   3
9   8   4   3
9   9   3   4
9   10  4   4
9   11  3   5
9   12  4   5

you can use eq and cumsum like:你可以使用eqcumsum像:

Q_A['gr2'] = Q_A['speakertype_id'].eq(3).cumsum()
print(Q_A)
   qna_id  qnacomponentid  speakertype_id  group  gr2
0       9               3               3      2    1
1       9               4               4      2    1
2       9               5               3      3    2
3       9               6               4      3    2
4       9               7               4      3    2
5       9               8               4      3    2
6       9               9               3      4    3
7       9              10               4      4    3
8       9              11               3      5    4
9       9              12               4      5    4

Note that not sure if you have any reason to start at 2, but you can add +1 after the cumsum if it is a requirement请注意,不确定您是否有任何理由从 2 开始,但如果需要,您可以在cumsum后添加+1

i reproduced as per your output:我按照您的输出进行了复制:

Q_A['cumsum'] = Q_A[Q_A.speakertype_id!=Q_A.speakertype_id.shift()].groupby('speakertype_id').cumcount()+2
Q_A['cumsum'] = Q_A['cumsum'].ffill().astype('int')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM