简体   繁体   中英

Whats the fastest way to loop through a DataFrame and count occurrences within the DataFrame whilst some condition is fulfilled (in Python)?

I have a dataframe with two Boolean fields (as below).

import pandas as pd

d = [{'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':False}, {'a1':False, 'a2':True},
     {'a1': False, 'a2': False}, {'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':True}, {'a1':False, 'a2':False},]

df = pd.DataFrame(d)
df

Out[1]: 
      a1     a2
0  False  False
1   True  False
2   True  False
3  False  False
4  False   True
5  False  False
6  False  False
7   True  False
8  False   True
9  False  False

I am trying to find the fastest and most "Pythonic" way of achieving the following:

  • If a1==True, count instances from current row where a2==False (eg row 1: a1=True, a2 is False for three rows from row 1)
  • At first instance of a2==True, stop counting (eg row 4, count = 3)
  • Set value of 'count' to new df column 'a3' on row where counting began (eg 'a3' = 3 on row 1)

Target result set as follows.

      a1     a2  a3
0  False  False   0
1   True  False   3
2   True  False   2
3  False  False   0
4  False   True   0
5  False  False   0
6  False  False   0
7   True  False   1
8  False   True   0
9  False  False   0

I have been trying to accomplish this using for loops, iterrows and while loops and so far haven't been able to produce a good nested combination which provides the results I want. Any help appreciated. I apologize if the problem is not totally clear.

How about this:

df['a3'] = df.apply(lambda x: 0 if not x.a1 else len(df.a2[x.name:df.a2.tolist()[x.name:].index(True)+x.name]), axis=1)

So, if a1 is False write 0 else write the length of list that goes from that row until next True .

This will do the trick:

df['a3'] = 0
# loop throught every value of 'a1'
for i in xrange(df['a1'].__len__()):
    # if 'a1' at position i is 'True'...
    if df['a1'][i] == True:
        count = 0
        # loop over the remaining items in 'a2'
        # remaining: __len__() - i
        # i: position of 'True' value in 'a1'
        for j in xrange(df['a2'].__len__() - i):
            # if the value of 'a2' is 'False'...
            if df['a2'][j + i] == False:
                # count the occurances of 'False' values in a row...
                count += 1
            else:
                # ... if it's not 'False' break the loop
                break
        # write the number of occurances on the right position (i) in 'a3'
        df['a3'][i] = count

and produce the following output:

      a1     a2  a3
0  False  False   0
1   True  False   3
2   True  False   2
3  False  False   0
4  False   True   0
5  False  False   0
6  False  False   0
7   True  False   1
8  False   True   0
9  False  False   0

Edit: added comments in the code

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM