简体   繁体   English

什么是最快的方式循环DataFrame并计算DataFrame中的事件,同时满足某些条件(在Python中)?

[英]Whats the fastest way to loop through a DataFrame and count occurrences within the DataFrame whilst some condition is fulfilled (in Python)?

I have a dataframe with two Boolean fields (as below). 我有一个带有两个布尔字段的数据框(如下所示)。

import pandas as pd

d = [{'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':False}, {'a1':False, 'a2':True},
     {'a1': False, 'a2': False}, {'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':True}, {'a1':False, 'a2':False},]

df = pd.DataFrame(d)
df

Out[1]: 
      a1     a2
0  False  False
1   True  False
2   True  False
3  False  False
4  False   True
5  False  False
6  False  False
7   True  False
8  False   True
9  False  False

I am trying to find the fastest and most "Pythonic" way of achieving the following: 我正在努力寻找实现以下目标的最快和最“Pythonic”方式:

  • If a1==True, count instances from current row where a2==False (eg row 1: a1=True, a2 is False for three rows from row 1) 如果a1 == True,则从当前行计算实例,其中a2 == False(例如,第1行:a1 = True,a2对于第1行中的3行为False)
  • At first instance of a2==True, stop counting (eg row 4, count = 3) 在a2 == True的第一个实例中,停止计数(例如,第4行,计数= 3)
  • Set value of 'count' to new df column 'a3' on row where counting began (eg 'a3' = 3 on row 1) 在计数开始的行上将'count'的值设置为新的df列'a3'(例如,第1行'a3'= 3)

Target result set as follows. 目标结果设置如下。

      a1     a2  a3
0  False  False   0
1   True  False   3
2   True  False   2
3  False  False   0
4  False   True   0
5  False  False   0
6  False  False   0
7   True  False   1
8  False   True   0
9  False  False   0

I have been trying to accomplish this using for loops, iterrows and while loops and so far haven't been able to produce a good nested combination which provides the results I want. 我一直试图使用for循环,iterrows和while循环来实现这一点,到目前为止还没有能够生成一个好的嵌套组合,它提供了我想要的结果。 Any help appreciated. 任何帮助赞赏。 I apologize if the problem is not totally clear. 如果问题不完全清楚,我道歉。

How about this: 这个怎么样:

df['a3'] = df.apply(lambda x: 0 if not x.a1 else len(df.a2[x.name:df.a2.tolist()[x.name:].index(True)+x.name]), axis=1)

So, if a1 is False write 0 else write the length of list that goes from that row until next True . 因此,如果a1为False ,则写入0否则写入从该行到下一个True的列表长度。

This will do the trick: 这样就可以了:

df['a3'] = 0
# loop throught every value of 'a1'
for i in xrange(df['a1'].__len__()):
    # if 'a1' at position i is 'True'...
    if df['a1'][i] == True:
        count = 0
        # loop over the remaining items in 'a2'
        # remaining: __len__() - i
        # i: position of 'True' value in 'a1'
        for j in xrange(df['a2'].__len__() - i):
            # if the value of 'a2' is 'False'...
            if df['a2'][j + i] == False:
                # count the occurances of 'False' values in a row...
                count += 1
            else:
                # ... if it's not 'False' break the loop
                break
        # write the number of occurances on the right position (i) in 'a3'
        df['a3'][i] = count

and produce the following output: 并产生以下输出:

      a1     a2  a3
0  False  False   0
1   True  False   3
2   True  False   2
3  False  False   0
4  False   True   0
5  False  False   0
6  False  False   0
7   True  False   1
8  False   True   0
9  False  False   0

Edit: added comments in the code 编辑:在代码中添加注释

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM