简体   繁体   中英

python group by and count based on time in another column

I'm trying to do group by and count the number of records that fit certain conditions using python.

The sample data is shown below. I want to create a new column 'phone_cnt' to show the number of calls that fits the following conditions: first, find the number that has at least one dept=0 record; then count the number of calls from the number which happen AFTER the dept=0 call


    np.random.seed(0)
    # create an array of 17 dates starting at '2015-02-24', one per hour
    rng = pd.date_range('2021-04-01', periods=17, freq='H')
    df = pd.DataFrame({ 'time': rng, 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],
                      'phone':[881,453,453,111,347,767,767,980,767,453,453,767,767,687,321,243,243],
                     'dept': [1,0,1,1,1,1,0,0,0,0,1,1,1,1,1,0,1]}) 
    df

Expected results: phone 243 has phone_cnt=1; 453 has 3 counts, 767 has 3 counts, and 980 has 0 count

I've tried the steps below. The first 2 steps work, but step 3 is wrong.


    # step 1: create a list of unique phone numbers which have dept=0 in records
    phonelist = df[df['dept']==0].phone.unique()
         
    # step 2: find all the calls from the above calls
    df1 = df[df['phone'].isin(phonelist)].sort_values(by = ['phone','time'], ascending = [True, True])
    df1
        
    # step 3: count the number of calls in df1 that happened after the dept=0 call for each number
    df2 =df1.groupby('phone')['time'].apply(lambda x: x>df[df['dept']==0].time).sum()).reset_index(name='count')

Can anyone help me? Thank you!!

Here is a way from where you left off at df1 using itertools.dropwhile :

from itertools import dropwhile

is_nonzero = lambda x: x != 0
df1.groupby("phone").dept.apply(lambda gr: len(list(dropwhile(is_nonzero, gr))) - 1)

gives

phone
243    1
453    3
767    3
980    0
Name: dept, dtype: int64

dropwhile drops the values while its predicate (ie nonzero-ness in this case) holds. This way we get a cropped group where only the first 0 and the remaining elements exist. Now we need the "length minus 1" of these guys. However, since dropwhile returns a "lazy" object, we invoke list first and then len on it. ( -1 at the end is because the desired values are after the first 0.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM