简体   繁体   English

遍历一个groupby dataframe 在每一行中操作

[英]Iterate over a groupby dataframe to operate in each row

I have a DataFrame like this:我有一个像这样的 DataFrame:

    subject  trial  attended
0         1      1         1
1         1      3         0
2         1      4         1
3         1      7         0
4         1      8         1
5         2      1         1
6         2      2         1
7         2      6         1
8         2      8         0
9         2      9         1
10        2     11         1
11        2     12         1
12        2     13         1
13        2     14         1
14        2     15         1
  1. I would like to GroupBy subject.我想 GroupBy 主题。
  2. Then iterate in each row of the GroupBy dataframe.然后在 GroupBy dataframe 的每一行中迭代。
  3. If for a row 'attended' == 1, then to increase a variable sum_reactive by 1.如果对于一行 'attended' == 1,则将变量 sum_reactive 增加 1。
  4. If the sum_reactive variable reaches == 4, then to add in a dictionary the 'subject' and 'trial' in which the variable sum_reactive reached a value of 4.如果 sum_reactive 变量达到 == 4,则在字典中添加变量 sum_reactive 达到值 4 的“subject”和“trial”。

I as trying to define a function for this, but it doesn't work:我试图为此定义一个 function ,但它不起作用:

def count_attended():
    sum_reactive = 0
    dict_attended = {}
    for i, g in reactive.groupby(['subject']):
        for row in g:
            if g['attended'][row] == 1:
                sum_reactive += 1
                if sum_reactive == 4:
                   dict_attended.update({g['subject'] : g['trial'][row]})
                   return dict_attended

    return dict_attended

I think that I don't have clear how to iterate inside each GroupBy dataframe.我认为我不清楚如何在每个 GroupBy dataframe 内部进行迭代。 I'm quite new using pandas.我是使用 pandas 的新手。

IIUC try, IIUC 尝试,

df = df.query('attended == 1')
df.loc[df.groupby('subject')['attended'].cumsum() == 4, ['subject', 'trial']].to_dict(orient='record')

Output: Output:

[{'subject': 2, 'trial': 9}]

Using groupby with cumsum will do the counting attended, then check to see when this value equals to 4 to create a boolean series.使用groupbycumsum将进行计数,然后检查该值何时等于 4 以创建 boolean 系列。 You can use this boolean series to do boolean indexing to filter your dataframe to certain rows.您可以使用此 boolean 系列进行 boolean 索引以将 dataframe 过滤到某些行。 Lastly, with lock and column filtering select subject and trial.最后,使用锁和列过滤 select 主题和试验。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM