I'm trying to do group by and count the number of records that fit certain conditions using python.
The sample data is shown below. I want to create a new column 'phone_cnt' to show the number of calls that fits the following conditions: first, find the number that has at least one dept=0 record; then count the number of calls from the number which happen AFTER the dept=0 call
np.random.seed(0)
# create an array of 17 dates starting at '2015-02-24', one per hour
rng = pd.date_range('2021-04-01', periods=17, freq='H')
df = pd.DataFrame({ 'time': rng, 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],
'phone':[881,453,453,111,347,767,767,980,767,453,453,767,767,687,321,243,243],
'dept': [1,0,1,1,1,1,0,0,0,0,1,1,1,1,1,0,1]})
df
Expected results: phone 243 has phone_cnt=1; 453 has 3 counts, 767 has 3 counts, and 980 has 0 count
I've tried the steps below. The first 2 steps work, but step 3 is wrong.
# step 1: create a list of unique phone numbers which have dept=0 in records
phonelist = df[df['dept']==0].phone.unique()
# step 2: find all the calls from the above calls
df1 = df[df['phone'].isin(phonelist)].sort_values(by = ['phone','time'], ascending = [True, True])
df1
# step 3: count the number of calls in df1 that happened after the dept=0 call for each number
df2 =df1.groupby('phone')['time'].apply(lambda x: x>df[df['dept']==0].time).sum()).reset_index(name='count')
Can anyone help me? Thank you!!
Here is a way from where you left off at df1
using itertools.dropwhile
:
from itertools import dropwhile
is_nonzero = lambda x: x != 0
df1.groupby("phone").dept.apply(lambda gr: len(list(dropwhile(is_nonzero, gr))) - 1)
gives
phone
243 1
453 3
767 3
980 0
Name: dept, dtype: int64
dropwhile
drops the values while its predicate (ie nonzero-ness in this case) holds. This way we get a cropped group where only the first 0
and the remaining elements exist. Now we need the "length minus 1" of these guys. However, since dropwhile
returns a "lazy" object, we invoke list
first and then len
on it. ( -1
at the end is because the desired values are after the first 0.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.