什么是最快的方式循环DataFrame并计算DataFrame中的事件，同时满足某些条件（在Python中）？

Question

I have a dataframe with two Boolean fields (as below). 我有一个带有两个布尔字段的数据框（如下所示）。

import pandas as pd

d = [{'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':False}, {'a1':False, 'a2':True},
     {'a1': False, 'a2': False}, {'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':True}, {'a1':False, 'a2':False},]

df = pd.DataFrame(d)
df

Out[1]: 
      a1     a2
0  False  False
1   True  False
2   True  False
3  False  False
4  False   True
5  False  False
6  False  False
7   True  False
8  False   True
9  False  False

I am trying to find the fastest and most "Pythonic" way of achieving the following: 我正在努力寻找实现以下目标的最快和最“Pythonic”方式：

If a1==True, count instances from current row where a2==False (eg row 1: a1=True, a2 is False for three rows from row 1) 如果a1 == True，则从当前行计算实例，其中a2 == False（例如，第1行：a1 = True，a2对于第1行中的3行为False）
At first instance of a2==True, stop counting (eg row 4, count = 3) 在a2 == True的第一个实例中，停止计数（例如，第4行，计数= 3）
Set value of 'count' to new df column 'a3' on row where counting began (eg 'a3' = 3 on row 1) 在计数开始的行上将'count'的值设置为新的df列'a3'（例如，第1行'a3'= 3）

Target result set as follows. 目标结果设置如下。

      a1     a2  a3
0  False  False   0
1   True  False   3
2   True  False   2
3  False  False   0
4  False   True   0
5  False  False   0
6  False  False   0
7   True  False   1
8  False   True   0
9  False  False   0

I have been trying to accomplish this using for loops, iterrows and while loops and so far haven't been able to produce a good nested combination which provides the results I want. 我一直试图使用for循环，iterrows和while循环来实现这一点，到目前为止还没有能够生成一个好的嵌套组合，它提供了我想要的结果。 Any help appreciated. 任何帮助赞赏。 I apologize if the problem is not totally clear. 如果问题不完全清楚，我道歉。

Answer 1

How about this: 这个怎么样：

df['a3'] = df.apply(lambda x: 0 if not x.a1 else len(df.a2[x.name:df.a2.tolist()[x.name:].index(True)+x.name]), axis=1)

So, if a1 is False write 0 else write the length of list that goes from that row until next True . 因此，如果a1为False ，则写入0否则写入从该行到下一个True的列表长度。

Answer 2

This will do the trick: 这样就可以了：

df['a3'] = 0
# loop throught every value of 'a1'
for i in xrange(df['a1'].__len__()):
    # if 'a1' at position i is 'True'...
    if df['a1'][i] == True:
        count = 0
        # loop over the remaining items in 'a2'
        # remaining: __len__() - i
        # i: position of 'True' value in 'a1'
        for j in xrange(df['a2'].__len__() - i):
            # if the value of 'a2' is 'False'...
            if df['a2'][j + i] == False:
                # count the occurances of 'False' values in a row...
                count += 1
            else:
                # ... if it's not 'False' break the loop
                break
        # write the number of occurances on the right position (i) in 'a3'
        df['a3'][i] = count

and produce the following output: 并产生以下输出：

      a1     a2  a3
0  False  False   0
1   True  False   3
2   True  False   2
3  False  False   0
4  False   True   0
5  False  False   0
6  False  False   0
7   True  False   1
8  False   True   0
9  False  False   0

Edit: added comments in the code 编辑：在代码中添加注释

什么是最快的方式循环DataFrame并计算DataFrame中的事件，同时满足某些条件（在Python中）？

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-09-26 11:19:11

解决方案2
1 2017-09-26 11:23:41

什么是最快的方式循环DataFrame并计算DataFrame中的事件，同时满足某些条件（在Python中）？

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-09-26 11:19:11

解决方案2 1 2017-09-26 11:23:41

解决方案1
3 已采纳 2017-09-26 11:19:11

解决方案2
1 2017-09-26 11:23:41